Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize gemv_n_sve kernel #5157

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

manaalmj
Copy link

@manaalmj manaalmj commented Feb 28, 2025

Loop-unrolling of sgemv_n kernel is implemented with svmla_lane, along with a main loop and a tail loop.
The graph below shows performance improvement for OMP_NUM_THREADS=1 on Graviton-3:

speedup

@fadara01
Copy link

fadara01 commented Mar 3, 2025

Can we also update the plot s.t:

  • it's discontinuous - don't connect the points to each other
  • showing clearly what the current state of affairs is vs this PR - maybe a red horizontal line at speedup=1
  • It might also make sense to use log scale for the Y-axis.
  • X-axis label is MxKxN, right?
  • add the number of threads/machine to the title

@martin-frbg
Copy link
Collaborator

yes, an explanation of the benchmark graph would be nice - I'm not sure I understand why your PR would result in two very narrow regions where there is more than tenfold speedup ?

@manaalmj
Copy link
Author

manaalmj commented Mar 7, 2025

yes, an explanation of the benchmark graph would be nice - I'm not sure I understand why your PR would result in two very narrow regions where there is more than tenfold speedup ?

Please refer to the updated graph. For the regions where we see spikes, The baseline performance of these problem sizes are poor and that's why we see spike in speedup.

@manaalmj
Copy link
Author

manaalmj commented Mar 7, 2025

Can we also update the plot s.t:

  • it's discontinuous - don't connect the points to each other
  • showing clearly what the current state of affairs is vs this PR - maybe a red horizontal line at speedup=1
  • It might also make sense to use log scale for the Y-axis.
  • X-axis label is MxKxN, right?
  • add the number of threads/machine to the title

Please see the updated graph.

@fadara01
Copy link

@martin-frbg could you please re-review?

@martin-frbg martin-frbg added this to the 0.3.30 milestone Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants