-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize get words #1015
Optimize get words #1015
Conversation
I was planning to implement something similar but I found this on GitHub and I decided to embed it. https://github.com/gharveymn/small_vector It seems well written (certainly better than what I would do), does exactly what we need, and the licence allows this. As usual this is putting everyting in PLMD namespace For reference, this is d98388b21b73be55e8c36ee18809e19a6fa82d34 of the original repository
Unfortunately it seems gcc 8 has a bug and cannot recognize a noexcept statement. My understanding is that this statement can be removed since it would just lead to a worse performance in some specific operation on some types of objects, but not to errors. I thus added a check so that this statement is removed by the preprocessor when using GCC 8 or less.
I basically reimplemented a simpler version of getWords which has some limitation: - only understand space (" ") as separator (no newlines or tabs) - do not understand braces This implies we cannot anymore do: plumed_cmd(p,"{init}") or plumed_cmd(p,"init\t") I think we never meant plumed_cmd to be used in this way. If we want to pass more structured strings, we can always pass them as the second argument. So I guess this is not a problem. Another way to make it faster is to avoid allocations. Instead of returning a std::vector<std::string>, we accept a reference to a preallocated small_vector<std::string_view>. This should require literally zero new allocations, as long as the number of words is low (for us it's always 0, 1 or 2). This required some modifications to the PlumedMain code, because - these std::string_view should be converted to std::string in some cases (with a subsequent allocation) - I had to modify the word_map object to be able to access it with a std::string_view With this optimization, and a simple test with a 20 atoms system, 4 of which used in a biased CV, I can gain ~ 20% speed on my laptop
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1015 +/- ##
==========================================
- Coverage 84.18% 84.17% -0.02%
==========================================
Files 612 613 +1
Lines 56497 56568 +71
==========================================
+ Hits 47563 47614 +51
- Misses 8934 8954 +20 ☔ View full report in Codecov by Sentry. |
@GiovanniBussi confirmed, with 11,500 ns/day vs 10,600 ns/day without this and 10,700 ns/day before htt. |
as a comparison without plumed we are 25,000 ns/day |
This is expected. I think we are not including in the detailed timers what happens between when we call I think this mostly relevant for very small (and irrelevant :-)) systems, but still it's a good exercise in hunting for bottlenecks. |
I managed to optimize getWords.
@carlocamilloni can you also check the performance in this branch?
In short, there's a new getWords written ad hoc for
plumed_cmd
strings, which is significantly faster for the following two reasons:small_vector<std::string_view>
to avoid memory allocations. std::string_view is c++17, small_vector is an external library that I included here, similarly to how we include lepton and other libraries. Notice that the small_vector library triggers a bug with gcc 8 that I fixed here.Before merging this branch I want to check with the author of the small_vector library if my fix is ok. To me, small_vector is a very useful tool that might allow optimizations in many places in PLUMED (basically, whenever we waste time allocating small temporary vectors). If I cannot sort this out, I can easily reimplement the optimization of getWords using a fixed size std::array (say with a maximum size of 4 words), that would work certainly in case of interpreting
plumed_cmd
strings.This PR closes #1011