-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Thrust #809
Comments
Background notesBuilding TBB as a static library is not recommened and is only supported because Intel has a "bigiron" business requirement. https://github.com/jckarter/tbb/blob/master/build/big_iron.inc Godot Engine doesn't use openmp because that requires a "MSVC redistributable". https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170 Edited: As far as I know openmp degrades nicely though, but it's also different from C++17 https://stackoverflow.com/questions/67848884/c-compiler-support-for-stdexecution-parallel-stl-algorithms |
I think PSTL is easier to switch to. It lacks some special APIs, but we can probably implement our own. The main issue here is compiler support, e.g. we need GCC 13 or libc++ to properly use it with onetbb. Using TBB directly will require a lot of work. Some algorithms are not that easy to implement efficiently. OpenMP is probably not an option. We tried that before and the performance is not that good, at least for thrust impelmentation. |
I'm thoroughly confused about PSTL and TBB, so I cannot really comment here. ..but if PSTL is part of c++17 that will get my vote. We already package TBB, so that should be easy to keep supporting. But searching around give me the feeling that PSTL and TBB are not particularly compatible? https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Is-PSTL-still-supported-by-TBB/m-p/1487798 |
@kintel They are compatible, but it depends on the versions... uxlfoundation/oneTBB#332 Basically:
Note that for 3, I only tested relatively new LLVM version. Not sure about which version is the oldest supported version. Probably require a relatively new version (https://reviews.llvm.org/D141779). And it seems that the PSTL support on libc++ is quite incomplete (https://libcxx.llvm.org/Status/PSTL.html), but that may be about PSTL support with other backends? |
This all sounds like a bit of a nightmare if targeting Linux distro packaging though, but perhaps that shouldn't be driving design decisions too much.. |
Here's a decision table for compiler support godotengine/godot#91833 |
gcc-13 It becomes a huge issue for providing recent application versions, a.k.a. dev builds though:
Long story short, if gcc-13 will be a requirement, that will kill almost all OpenSCAD dev builds we currently provide for older distributions and will make AppImages impossible for a couple of years. c++17 |
I'm curious what other libraries use. We should not be the first one hitting this compatibility issue? And yeah, I don't think we want to make gcc-13 a requirement. I don't think we can rely solely on PSTL for now. |
@pca006132 do you have a listing of all the thrust apis we use? It'll help us select another option. |
I think the slightly trickier ones to implement are things like |
Regarding compatibility - my impression is PSTL and TBB are related a little like Thrust and CUB. They have slightly different APIs and TBB and CUB are slightly lower-level. But mostly: PSTL/Thrust is really just APIs, while TBB and CUB have actual parallel algorithm implementations. So I think TBB using PSTL was probably a bootstrap to get some OpenMP support before TBB was finished or something. Nowadays it seems we're in a PSTL calls TBB (or OpenMP) under the hood kind of situation, which is much how we currently use Thrust. |
Gathered by chatgpt4 from THRUST_DYNAMIC_BACKEND(copy_if, void)
THRUST_DYNAMIC_BACKEND_VOID(exclusive_scan)
THRUST_DYNAMIC_BACKEND_VOID(for_each)
THRUST_DYNAMIC_BACKEND_VOID(for_each_n)
THRUST_DYNAMIC_BACKEND(gather_if, void)
THRUST_DYNAMIC_BACKEND_VOID(gather)
THRUST_DYNAMIC_BACKEND(reduce_by_key, void)
THRUST_DYNAMIC_BACKEND_VOID(scatter)
THRUST_DYNAMIC_BACKEND_VOID(sequence)
THRUST_DYNAMIC_BACKEND(transform_reduce, void)
STL_DYNAMIC_BACKEND(all_of, bool)
STL_DYNAMIC_BACKEND(count_if, int)
STL_DYNAMIC_BACKEND_VOID(copy)
STL_DYNAMIC_BACKEND_VOID(copy_n)
STL_DYNAMIC_BACKEND(find_if, void)
STL_DYNAMIC_BACKEND(find, void)
STL_DYNAMIC_BACKEND(fill, void)
STL_DYNAMIC_BACKEND(inclusive_scan, void)
STL_DYNAMIC_BACKEND(is_sorted, bool)
STL_DYNAMIC_BACKEND(remove_if, void)
STL_DYNAMIC_BACKEND(remove, void)
STL_DYNAMIC_BACKEND(reduce, void)
STL_DYNAMIC_BACKEND_VOID(stable_sort)
STL_DYNAMIC_BACKEND_VOID(transform)
STL_DYNAMIC_BACKEND_VOID(uninitialized_copy)
STL_DYNAMIC_BACKEND_VOID(uninitialized_fill) Feel free to edit my list. |
I think regarding old compilers that have bugs or lack support for certain PSTL algorithms, we should just let those fall back to single-threaded. Then it should work everywhere, but it'll be fastest on the latest platforms. That feels like a reasonable compromise regarding maintainability and compatibility. I don't think we can afford to optimize performance heavily for every old platform. |
Besides, I feel like on average we only get ~2x speedup for parallel over single-threaded anyway. CPU pipelining is pretty good when your algorithms are parallelized! |
I think there can be 4x speedup, and probably more if we can optimize mesh simplification better. The major issue with old vs new platform is that people like to have a single binary, e.g. appimage for openscad, and that means they need to use the single threaded version for several years. |
@elalish btw, by "So I think TBB using PSTL was probably a bootstrap to get some OpenMP support before TBB was finished or something." do you mean "So I think PSTL using TBB was probably a bootstrap to get some OpenMP support before PSTL was finished or something"? TBB does not depend on PSTL. |
Maybe I misunderstood what you were saying earlier. Either way it's confusing enough we should probably chat about it sometime face-to-face. |
For the record, our current goal is to get rid of thrust and use PSTL for parallelization. Users with GCC 12 or older will hit #787. They can either disable multicore (it is slower, but typically not that slow) or accept the leak. Considering other users, e.g. openscad, did not report such leak causing an issue, this should be acceptable. And if needed, we can have some intermediate option, where we use tbb for_each directly but no PSTL algorithms. This will be slower than using every parallel APIs, but the user can still get some multicore performance improvement without having to live with memory leak. |
Thrust is now deprecated, and we've been wanting to move off it for awhile anyway since we're no longer using CUDA. Thrust is turning in CCCL - getting more integrated with CUDA, so we don't need that. Thrust gave birth to PSTL, which is pretty widely supported now (C++17). PSTL appears to be backed by TBB and/or OpenMP in most compiler's standard libraries.
I think the big question is: do we switch to PSTL or TBB? What's your opinion, @pca006132? Related: #520
My impression is PSTL might be easier to switch to since the API shape is close to Thrust.
On the other hand, TBB is lower-level and so may have more performance, and we already have a little TBB code.
@fire @kintel thoughts on what would be easiest to consume as far as dependencies from a downstream perspective?
The text was updated successfully, but these errors were encountered: