Add SIMD option to compile system and implement SIMD for vector4. #86340

Daylily-Zeleen · 2023-12-19T20:32:25Z

To implement SIMD in godot math is a huge work, this is the firt step.

I'm not sure if it's worth, and this change lack of testing.
If this is not worth, I will stop progress and remain in my branch.

CPU: I7-8750H
Build options: platform=windows arch=x86_x64 target=template_release debug_symbols=no bits=64 tests=yes optimize=speed

Results:
test_results.md

As the results show, SIMD does not always bring better speed, I don't know whether there have mistakes in this pr.
I remaind my test in "test_vector4.h", please check it.

And I have arm platform, build with "simd_type=neon" need test by others.

fire · 2023-12-19T20:59:44Z

I believe that we can support arm64 and x86_64 simd. #83648 so I support this. I tried to do some work in the mikktspace pr.

Edited:

My changes were moved to https://github.com/V-Sekai/godot/tree/mikktspace-optimization-simd

AThousandShips · 2023-12-20T13:25:31Z

While this is a great addition IMO there needs to be some way to make the code readable, some helper methods or macros for example, this code is very hard to follow with the many options

Daylily-Zeleen · 2023-12-21T12:20:35Z

While this is a great addition IMO there needs to be some way to make the code readable, some helped methods or macros for example, this code is very hard to follow with the many options

I have not idea about improving readable. The current is already the best implementation I can achieve.

lawnjelly · 2023-12-21T12:59:30Z

Imo it's a better idea to keep the results in the post rather than a separate file, which provides a barrier to people reading it (and it is important to show that the gains here are minimal).
There is a "details" tag which can be used for this if necessary, but longhand is fine too.

There have been several discussions regarding SIMD in proposals:
godotengine/godot-proposals#290
godotengine/godot-proposals#4563
godotengine/godot-proposals#7778

Bear in mind we need to come to a consensus on a proposal before adding a new feature. This has happened a few times with new contributors not understanding the proposal / pre-approval process.

There are several problems to consider here:

SIMD only really shines if instructions are used in loops. This is largely why your performance figures aren't great, the overhead of calling functions will in most cases dwarf any gains.
Data layout / cache friendly access are super important at this level.
We ideally want to abstract SIMD and not pollute the core code with SIMD stuff, as this both makes it more difficult to read, but more difficult to modify for other contributors. (They might believe that in order to make a change, they need to alter reference code, SSE and NEON, which is a big ask.)
We need to remember that hard coding architecture at compile time means putting a minimum hardware limit (although e.g. SSE2 is mandated on x86 64 bit so this can be taken advantage of).
CPU caps detection and dynamic dispatch is generally preferable as it allows using all flavours of SIMD whatever the caps matrix, with backward compatibility. Although dynamic dispatch makes more sense in loops.
If hard coding architecture we may be able to use auto-vectorization for simple stuff like add / subtract / multiply, and only need bespoke code for things like inverse sqrt. In fact you should check existing assembly output, some of this might already be auto-vectorized, which would explain why you are seeing little difference in timings.
There are libraries for abstracting SIMD which might be considered for some cases instead of writing e.g. specific SSE or NEON.

On the plus side, just ensuring that these single instruction functions take full advantage of SIMD may still be worth doing as they require little changes to most code, and might get a reasonable gain.

I'm personally of the opinion that eventually offering both some conversion for existing single instructions (as this PR does), and offering single instruction multiple data function APIs (as in godotengine/godot-proposals#290 is a good way forward.

Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch 3 times, most recently from faba070 to a8d745f Compare December 20, 2023 08:50

AThousandShips added enhancement topic:core performance labels Dec 20, 2023

AThousandShips added this to the 4.x milestone Dec 20, 2023

Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch from a8d745f to 693a941 Compare December 20, 2023 13:13

Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch from 693a941 to cedeec4 Compare December 21, 2023 11:40

Add SIMD option to compile system and implement SIMD for Vector4.

14d8314

Daylily-Zeleen force-pushed the daylily-zeleen/simd_for_vector4 branch from cedeec4 to 14d8314 Compare December 21, 2023 11:50

Daylily-Zeleen marked this pull request as ready for review December 21, 2023 12:14

Daylily-Zeleen requested review from a team as code owners December 21, 2023 12:14

Calinou mentioned this pull request Jun 29, 2024

Use SIMD vectorization intrinsics in the core godotengine/godot-proposals#4563

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SIMD option to compile system and implement SIMD for vector4. #86340

Add SIMD option to compile system and implement SIMD for vector4. #86340

Daylily-Zeleen commented Dec 19, 2023 •

edited

Loading

fire commented Dec 19, 2023 •

edited

Loading

AThousandShips commented Dec 20, 2023 •

edited

Loading

Daylily-Zeleen commented Dec 21, 2023

lawnjelly commented Dec 21, 2023 •

edited

Loading

Add SIMD option to compile system and implement SIMD for vector4. #86340

Are you sure you want to change the base?

Add SIMD option to compile system and implement SIMD for vector4. #86340

Conversation

Daylily-Zeleen commented Dec 19, 2023 • edited Loading

fire commented Dec 19, 2023 • edited Loading

AThousandShips commented Dec 20, 2023 • edited Loading

Daylily-Zeleen commented Dec 21, 2023

lawnjelly commented Dec 21, 2023 • edited Loading

Daylily-Zeleen commented Dec 19, 2023 •

edited

Loading

fire commented Dec 19, 2023 •

edited

Loading

AThousandShips commented Dec 20, 2023 •

edited

Loading

lawnjelly commented Dec 21, 2023 •

edited

Loading