-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline String::utf8
and String::utf16
for their simplicity.
#101356
base: master
Are you sure you want to change the base?
Inline String::utf8
and String::utf16
for their simplicity.
#101356
Conversation
@@ -523,11 +523,19 @@ class String { | |||
CharString ascii(bool p_allow_extended = false) const; | |||
CharString utf8() const; | |||
Error parse_utf8(const char *p_utf8, int p_len = -1, bool p_skip_cr = false); | |||
static String utf8(const char *p_utf8, int p_len = -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe _FORCE_INLINE_
, even if the compiler will probably do it either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm not a big fan of _FORCE_INLINE_
; for most purposes I think it is unnecessary (the compiler will usually inline trivial functions like this anyway).
But you're right, I do know it's quite common in the codebase and often requested by reviewers. I'll leave this open for now to let others weigh in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be useful sometime. For example, debug build will not inline them (debug build do not inline anything by default), but if your debug build is too slow, and what you want to debug is not in an inline function, you can force all those FORCE_INLINE to inline even in debug with /d2Obforceinline
or /Ob1
(in MSVC) to regain some speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point actually. Making non-optimizing builds bearable in speed is a worthwhile endeavour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not a big fan of _FORCE_INLINE_
, I think Godot overuses it in general and it should be more targeted, there is little considerations of the downsides.
But I did do a test a few years ago and runtime performance was admittedly a tad faster with it overall (versus being macrod out).
Note it's compiled out in DEV
builds.
LTO does inlining, so I don't think it's necessary. |
macOS, iOS, Android and MSVC builds do not have LTO enabled. Plus, by the time LTO comes around, many opportunities for optimization have already passed, so at that stage it's much less guaranteed to actually help. |
LTO can be used on these platforms.
I previously used I also ran micro benchmarks with LTO and the linker optimized it at the same level. |
This discussion is outside the scope of this PR. I do agree that checking whether LTO should be enabled on other platforms might be warranted. Feel free to make a godot-proposal for this. I can point you in the right direction - searching for
For most situations, adding or removing
I suppose this is expected in most situations, but again - by LTO time, a lot of opportunities for optimizations have passed. Even if the functions were compiled to the same binary in some situations, it may unexpectedly affect others. LTO is not a 'it behaves as though both are the same module' drop-in thing. |
Hah, I don't want to get into all this micro-optimization. But maybe you didn't look well. Lines 68 to 77 in 0257995
In general, I'm against these inlines because they increase compile time and are almost useless since LTO already does it. |
Sure, but afaik this is not performed in official builds :)
If LTO inlines anyway then forcing or encouraging an inline won't affect compile time. |
Ah. Now I see it here. |
I mean most builds are done without LTO to see if everything works, run unit tests and check for regressions. |
Just to be clear, official builds are made with Depending on the platform and compiler we have mixed results with LTO, where it seems to be quite beneficial with GCC (increases performance and reduces binary size) while it's less clear cut with Clang (increases performance but also increases size significantly with heavy inlining, and Clang full LTO is egregiously slow - no parallel linking - while its ThinLTO makes binaries even bigger), and it's seamingly unusable with MSVC LTCG. |
33a34fe
to
67cdb1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, it works as expected. Code looks good to me.
Binary size for a Linux x86_64 stripped optimized build (production=yes lto=full
) goes up by 8 KB for the editor, 4 KB for a release export template.
This PR is a bit faster to start up and shut down the editor on an empty project:
❯ hyperfine -m25 -iw1 "bin/godot.linuxbsd.editor.x86_64.master /tmp/4/project.godot --quit" "bin/godot.linuxbsd.editor.x86_64 /tmp/4/project.godot --quit"
Benchmark 1: bin/godot.linuxbsd.editor.x86_64.master /tmp/4/project.godot --quit
Time (mean ± σ): 4.099 s ± 0.470 s [User: 2.517 s, System: 0.555 s]
Range (min … max): 3.661 s … 4.640 s 25 runs
Benchmark 2: bin/godot.linuxbsd.editor.x86_64 /tmp/4/project.godot --quit
Time (mean ± σ): 3.994 s ± 0.434 s [User: 2.522 s, System: 0.562 s]
Range (min … max): 3.648 s … 4.654 s 25 runs
That's odd, I wonder where that's coming from. I would expect the functions to get inlined, Perhaps it's a bunch of domino effects adding up to the difference? 🤔 Edit: I recreated this locally. Seems like this PR is 13kb larger than master for me (macos, with debug symbols enabled in |
Continuing from #101352 (comment) - might as well get it out of the way now.
The functions are trivial, and inlining them allows the compiler to reason better about them. Technically it's a tiny optimization since the compiler may now optimize the intermediate
String
away (though it won't make a practical difference).