-
-
Notifications
You must be signed in to change notification settings - Fork 22k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float literals - fix main primitives to use .f #57703
Float literals - fix main primitives to use .f #57703
Conversation
On a similar note, should using |
I'm personally ok with casting in this situation, as it makes it explicit. It did fox me the first time though. Don't know what others think about solutions for this. |
2f80f25
to
6c6dcba
Compare
Worth mentioning as well, an alternative approach to handling constants, rather than casting where used, it to define them as e.g.:
However this won't work for some of the constants that have more significant figures e.g.:
because in some circumstances we may WANT to use them as doubles even when compiling with 32bit We could alternatively have defines for real_t and double, something like this:
or perhaps the opposite might be more practical, i.e. have the
I can change the PR to use this type of approach if it is preferred, there are advantages and disadvantages to each. Having (real_t) in the actual code is more verbose, but on the other hand it is more obvious what is going on to a reader, and might be more newbie friendly than being "clever" with the defines. |
PR Meeting: Split into 2 PRs, one for .f and one for (real_t) casts, and do all the key math stuff in these PRs (vectors, math etc). We won't bother with non-bottleneck stuff, and keep the standard approach as using double literal, and just fix up in future where necessary. |
6c6dcba
to
b39be40
Compare
Converts float literals from double format (e.g. 0.0) to float format (e.g. 0.0f) where appropriate for 32 bit calculations.
b39be40
to
5298e16
Compare
b < 0.04045 ? b * (1.0 / 12.92) : Math::pow((b + 0.055) * (1.0 / (1 + 0.055)), 2.4), | ||
r < 0.04045f ? r * (1.0 / 12.92) : Math::pow((r + 0.055f) * (float)(1.0 / (1 + 0.055)), 2.4f), | ||
g < 0.04045f ? g * (1.0 / 12.92) : Math::pow((g + 0.055f) * (float)(1.0 / (1 + 0.055)), 2.4f), | ||
b < 0.04045f ? b * (1.0 / 12.92) : Math::pow((b + 0.055f) * (float)(1.0 / (1 + 0.055)), 2.4f), | ||
a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the (float)
casting is on purpose here rather than in a later PR - it needs to be done to prevent a warning using pow
, otherwise the version of pow to call (float or double) is unclearly defined.
Just to note, this should be good to go now. 👍 Could do with someone else glancing through. It's pretty much a case of replacing with .f, except for special cases where double precision might be needed. I might have missed the odd conversion, but it's not crucial, we are just aiming to get most of those using 32 bit math when compiled real_t as float. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (very cursory review).
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is already merged but I decided to look over it anyway and I give it a ✅
Everything in this PR falls into one of these 3 categories:
- The float literal has an identical value to the double literal (such as sums of powers of two
0.5f
or integers0.0f
,1.0f
, etc) - The value is something we don't need the exact value of (such as checking if a value is
> 0.9999f
) - The value is in Color, which should be always 32-bit anyway (since more precision is useless for Color)
Converts float literals from double format (e.g.
0.0
) to float format (e.g.0.0f
) where appropriate for 32 bit calculations.As discussed in the PR meeting, there was a desire for 2 PRs, this one, including changes to use
0.0f
float format literals, and another one for casts toreal_t
etc.As discussed, this covers the main classes in
core/math
which are most likely to be bottlenecks, rather than the whole codebase. The exception is BVH which makes more sense to do in the next sync with 3.x.See #56396 for details.
Introduction
After some investigation using godbolt using clang, gcc, msvc, ARM and all major compilers, it turns out that our technique of using double literals in float calculations
0.0
is problematic because it causes promotion of the calculation to occur in 64 bit.This involves:
This is considerably less efficient than simply doing the calculation in 32 bit in a single instruction, which is normally the intention. The difference this makes in terms of speed / power use etc will depend on the platform - some mobiles may be particularly sensitive. The binary may also be slightly smaller too, but probably not by a huge margin.
There appear to be 3 obvious ways of resolving this problem:
0.0f
float
(orreal_t
depending on usage)real_t
Caveats
Although these calculations will be be faster, the result may in some cases be very slight smidgen different. Although the arguments and results are to all extents and purposes 32 bit in the before and after case, the calculation was done at a higher precision when 64 bit was (erroneously) used.
This is equivalent to the difference between the result of 80 bit floating point calculations converted to 64 bit on the old x86 instructions. Something to be aware of when attempting e.g. deterministic repeatability, but hopefully not hugely important in terms of result accuracy (although we might watch for any possible difference in e.g. physics). This would have been nice to have avoided by doing all the calculations at 32 bit from the get go, but it seems worth investigating this change.