Optimize gemv_n_sve kernel #5157

manaalmj · 2025-02-28T10:22:07Z

Loop-unrolling of sgemv_n kernel is implemented with svmla_lane, along with a main loop and a tail loop.
The graph below shows performance improvement for OMP_NUM_THREADS=1 on Graviton-3:

kernel/arm64/gemv_n_sve.c

fadara01 · 2025-03-03T10:23:12Z

Can we also update the plot s.t:

it's discontinuous - don't connect the points to each other
showing clearly what the current state of affairs is vs this PR - maybe a red horizontal line at speedup=1
It might also make sense to use log scale for the Y-axis.
X-axis label is MxKxN, right?
add the number of threads/machine to the title

kernel/arm64/gemv_n_sve.c

martin-frbg · 2025-03-03T13:47:18Z

yes, an explanation of the benchmark graph would be nice - I'm not sure I understand why your PR would result in two very narrow regions where there is more than tenfold speedup ?

manaalmj · 2025-03-07T11:44:52Z

yes, an explanation of the benchmark graph would be nice - I'm not sure I understand why your PR would result in two very narrow regions where there is more than tenfold speedup ?

Please refer to the updated graph. For the regions where we see spikes, The baseline performance of these problem sizes are poor and that's why we see spike in speedup.

manaalmj · 2025-03-07T11:45:16Z

Can we also update the plot s.t:

it's discontinuous - don't connect the points to each other

showing clearly what the current state of affairs is vs this PR - maybe a red horizontal line at speedup=1

It might also make sense to use log scale for the Y-axis.

X-axis label is MxKxN, right?

add the number of threads/machine to the title

Please see the updated graph.

fadara01 · 2025-03-10T15:05:31Z

@martin-frbg could you please re-review?

kernel/arm64/KERNEL.ARMV8SVE

kernel/arm64/gemv_n_sve.c

fadara01 reviewed Mar 3, 2025

View reviewed changes

kernel/arm64/gemv_n_sve.c Outdated Show resolved Hide resolved

fadara01 reviewed Mar 3, 2025

View reviewed changes

kernel/arm64/gemv_n_sve.c Outdated Show resolved Hide resolved

manaalmj force-pushed the feature branch from f470592 to 6b9a969 Compare March 7, 2025 11:19

fadara01 reviewed Mar 10, 2025

View reviewed changes

kernel/arm64/KERNEL.ARMV8SVE Outdated Show resolved Hide resolved

manaalmj force-pushed the feature branch from 6b9a969 to 4e6ac98 Compare March 10, 2025 15:52

fadara01 reviewed Mar 10, 2025

View reviewed changes

kernel/arm64/gemv_n_sve.c Show resolved Hide resolved

kernel/arm64/gemv_n_sve.c Show resolved Hide resolved

Optimize gemv_n_sve kernel

5c4e38a

manaalmj force-pushed the feature branch from 4e6ac98 to 5c4e38a Compare March 10, 2025 16:39

martin-frbg added this to the 0.3.30 milestone Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize gemv_n_sve kernel #5157

Optimize gemv_n_sve kernel #5157

manaalmj commented Feb 28, 2025 •

edited

Loading

fadara01 commented Mar 3, 2025 •

edited

Loading

martin-frbg commented Mar 3, 2025

manaalmj commented Mar 7, 2025

manaalmj commented Mar 7, 2025

fadara01 commented Mar 10, 2025

Optimize gemv_n_sve kernel #5157

Are you sure you want to change the base?

Optimize gemv_n_sve kernel #5157

Conversation

manaalmj commented Feb 28, 2025 • edited Loading

fadara01 commented Mar 3, 2025 • edited Loading

martin-frbg commented Mar 3, 2025

manaalmj commented Mar 7, 2025

manaalmj commented Mar 7, 2025

fadara01 commented Mar 10, 2025

manaalmj commented Feb 28, 2025 •

edited

Loading

fadara01 commented Mar 3, 2025 •

edited

Loading