Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize get words #1015

Merged
merged 6 commits into from
Feb 9, 2024
Merged

Optimize get words #1015

merged 6 commits into from
Feb 9, 2024

Conversation

GiovanniBussi
Copy link
Member

I managed to optimize getWords.

@carlocamilloni can you also check the performance in this branch?

In short, there's a new getWords written ad hoc for plumed_cmd strings, which is significantly faster for the following two reasons:

  1. I removed the possibility to use complex syntax (parenthesis, tabs etc) in cmd string. I think this is fine, please let me know if anyone has concerns.
  2. I used a small_vector<std::string_view> to avoid memory allocations. std::string_view is c++17, small_vector is an external library that I included here, similarly to how we include lepton and other libraries. Notice that the small_vector library triggers a bug with gcc 8 that I fixed here.

Before merging this branch I want to check with the author of the small_vector library if my fix is ok. To me, small_vector is a very useful tool that might allow optimizations in many places in PLUMED (basically, whenever we waste time allocating small temporary vectors). If I cannot sort this out, I can easily reimplement the optimization of getWords using a fixed size std::array (say with a maximum size of 4 words), that would work certainly in case of interpreting plumed_cmd strings.

This PR closes #1011

I was planning to implement something similar but I found this
on GitHub and I decided to embed it.
https://github.com/gharveymn/small_vector

It seems well written (certainly better than what I would
do), does exactly what we need, and the licence allows this.

As usual this is putting everyting in PLMD namespace

For reference, this is d98388b21b73be55e8c36ee18809e19a6fa82d34
of the original repository
Unfortunately it seems gcc 8 has a bug and cannot recognize
a noexcept statement.

My understanding is that this statement can be removed since
it would just lead to a worse performance in some specific operation
on some types of objects, but not to errors. I thus added a check
so that this statement is removed by the preprocessor when using
GCC 8 or less.
I basically reimplemented a simpler version of getWords which
has some limitation:
- only understand space (" ") as separator (no newlines or tabs)
- do not understand braces

This implies we cannot anymore do:
plumed_cmd(p,"{init}")
or
plumed_cmd(p,"init\t")
I think we never meant plumed_cmd to be used in this way. If we want
to pass more structured strings, we can always pass them as the second
argument. So I guess this is not a problem.

Another way to make it faster is to avoid allocations. Instead
of returning a std::vector<std::string>, we accept
a reference to a preallocated small_vector<std::string_view>.
This should require literally zero new allocations, as long as the number
of words is low (for us it's always 0, 1 or 2). This required some modifications
to the PlumedMain code, because
- these std::string_view should be converted to std::string in some cases (with a subsequent allocation)
- I had to modify the word_map object to be able to access it with a std::string_view

With this optimization, and a simple test with a 20 atoms system, 4 of which used in a biased
CV, I can gain ~ 20% speed on my laptop
@GiovanniBussi GiovanniBussi added the wip Do not merge label Feb 2, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 2, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (f210467) 84.18% compared to head (89193cd) 84.17%.

❗ Current head 89193cd differs from pull request most recent head 0329172. Consider uploading reports for the commit 0329172 to get more accurate results

Files Patch % Lines
src/core/PlumedMain.cpp 90.90% 2 Missing ⚠️
src/tools/Tools.cpp 91.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1015      +/-   ##
==========================================
- Coverage   84.18%   84.17%   -0.02%     
==========================================
  Files         612      613       +1     
  Lines       56497    56568      +71     
==========================================
+ Hits        47563    47614      +51     
- Misses       8934     8954      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@carlocamilloni
Copy link
Member

@GiovanniBussi confirmed, with 11,500 ns/day vs 10,600 ns/day without this and 10,700 ns/day before htt.
Here the gain appears in the total time (~9 s vs 10.6 s) but not in the detailed timers.

@carlocamilloni
Copy link
Member

as a comparison without plumed we are 25,000 ns/day

@GiovanniBussi
Copy link
Member Author

Here the gain appears in the total time (~9 s vs 10.6 s) but not in the detailed timers

This is expected. I think we are not including in the detailed timers what happens between when we call plumed_cmd and we start doing something. We always assumed this part of the code is not impacting performances.

I think this mostly relevant for very small (and irrelevant :-)) systems, but still it's a good exercise in hunting for bottlenecks.

@carlocamilloni
Copy link
Member

carlocamilloni commented Feb 2, 2024

i agree I tested a 110,000 atom using saxs (the one for which we had 2.5% loss of performances by moving to htt) and the loss of performance is still exactly 2.5% irrespectively of all the optimisations done from that time. The main difference is in the "waiting data" (left is today, middle is before htt, with the htt at that time)
Screenshot 2024-02-02 at 18 27 41
but this test is short (250 ps using 4 openMP threads)

@GiovanniBussi GiovanniBussi merged commit 0329172 into master Feb 9, 2024
32 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wip Do not merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize getWords()
4 participants