Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD option to compile system and implement SIMD for vector4. #86340

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Daylily-Zeleen
Copy link
Contributor

@Daylily-Zeleen Daylily-Zeleen commented Dec 19, 2023

To implement SIMD in godot math is a huge work, this is the firt step.

I'm not sure if it's worth, and this change lack of testing.
If this is not worth, I will stop progress and remain in my branch.


CPU: I7-8750H
Build options: platform=windows arch=x86_x64 target=template_release debug_symbols=no bits=64 tests=yes optimize=speed

Results:
test_results.md


As the results show, SIMD does not always bring better speed, I don't know whether there have mistakes in this pr.
I remaind my test in "test_vector4.h", please check it.

And I have arm platform, build with "simd_type=neon" need test by others.

@fire
Copy link
Member

fire commented Dec 19, 2023

I believe that we can support arm64 and x86_64 simd. #83648 so I support this. I tried to do some work in the mikktspace pr.

Edited:

My changes were moved to https://github.com/V-Sekai/godot/tree/mikktspace-optimization-simd

@Daylily-Zeleen Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch 3 times, most recently from faba070 to a8d745f Compare December 20, 2023 08:50
@AThousandShips AThousandShips added this to the 4.x milestone Dec 20, 2023
@Daylily-Zeleen Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch from a8d745f to 693a941 Compare December 20, 2023 13:13
@AThousandShips
Copy link
Member

AThousandShips commented Dec 20, 2023

While this is a great addition IMO there needs to be some way to make the code readable, some helper methods or macros for example, this code is very hard to follow with the many options

@Daylily-Zeleen Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch from 693a941 to cedeec4 Compare December 21, 2023 11:40
@Daylily-Zeleen Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch from cedeec4 to 14d8314 Compare December 21, 2023 11:50
@Daylily-Zeleen Daylily-Zeleen marked this pull request as ready for review December 21, 2023 12:14
@Daylily-Zeleen Daylily-Zeleen requested review from a team as code owners December 21, 2023 12:14
@Daylily-Zeleen
Copy link
Contributor Author

While this is a great addition IMO there needs to be some way to make the code readable, some helped methods or macros for example, this code is very hard to follow with the many options

I have not idea about improving readable. The current is already the best implementation I can achieve.

@lawnjelly
Copy link
Member

lawnjelly commented Dec 21, 2023

Imo it's a better idea to keep the results in the post rather than a separate file, which provides a barrier to people reading it (and it is important to show that the gains here are minimal).
There is a "details" tag which can be used for this if necessary, but longhand is fine too.

There have been several discussions regarding SIMD in proposals:
godotengine/godot-proposals#290
godotengine/godot-proposals#4563
godotengine/godot-proposals#7778

Bear in mind we need to come to a consensus on a proposal before adding a new feature. This has happened a few times with new contributors not understanding the proposal / pre-approval process.

There are several problems to consider here:

  • SIMD only really shines if instructions are used in loops. This is largely why your performance figures aren't great, the overhead of calling functions will in most cases dwarf any gains.
  • Data layout / cache friendly access are super important at this level.
  • We ideally want to abstract SIMD and not pollute the core code with SIMD stuff, as this both makes it more difficult to read, but more difficult to modify for other contributors. (They might believe that in order to make a change, they need to alter reference code, SSE and NEON, which is a big ask.)
  • We need to remember that hard coding architecture at compile time means putting a minimum hardware limit (although e.g. SSE2 is mandated on x86 64 bit so this can be taken advantage of).
  • CPU caps detection and dynamic dispatch is generally preferable as it allows using all flavours of SIMD whatever the caps matrix, with backward compatibility. Although dynamic dispatch makes more sense in loops.
  • If hard coding architecture we may be able to use auto-vectorization for simple stuff like add / subtract / multiply, and only need bespoke code for things like inverse sqrt. In fact you should check existing assembly output, some of this might already be auto-vectorized, which would explain why you are seeing little difference in timings.
  • There are libraries for abstracting SIMD which might be considered for some cases instead of writing e.g. specific SSE or NEON.

On the plus side, just ensuring that these single instruction functions take full advantage of SIMD may still be worth doing as they require little changes to most code, and might get a reasonable gain.

I'm personally of the opinion that eventually offering both some conversion for existing single instructions (as this PR does), and offering single instruction multiple data function APIs (as in godotengine/godot-proposals#290 is a good way forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants