-
-
Notifications
You must be signed in to change notification settings - Fork 22k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Font
calculations by avoiding unnecessary copy-on-write.
#102132
Optimize Font
calculations by avoiding unnecessary copy-on-write.
#102132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch !
I believe that CoW forced on and the edge cases on a case-by-case issue were historical improvements. https://godotengine.org/article/why-we-broke-your-pr/ So I don't suggest flipping it back. |
My suggestion would be to double down and do the same to Array as was done to Vector, for the same reason outlined by your linked article :) In other words, the opposite of flipping it back! |
This change should be completely safe, so might be worth moving it to 4.4 milestone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, it works as expected.
Scrolling the Scene tree dock with 16385 nodes in it feels noticeably smoother, especially near the bottom (tested on an Linux optimized editor build with LTO).
if (TS->font_has_char(rids.get(i), p_char)) { | ||
int32_t glyph = TS->font_get_glyph_index(rids.get(i), p_font_size, p_char, 0); | ||
TS->font_draw_glyph(rids.get(i), p_canvas_item, p_font_size, p_pos, glyph, p_modulate); | ||
return TS->font_get_glyph_advance(rids.get(i), p_font_size, glyph).x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we gain anything from extracting rids.get(i)
to a const variable, or will the compiler optimize it all the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question!
Array.get
has the overhead of a function call, an if statement that is likely to be predicted correctly (read_only
), and another if statement that is likely to be predicted correctly (bad index). If the array is large, there will also be an unnecessary cache contention (through repeated access of size()
and dereferencing ArrayPrivate
).
All in all, it could be optimized, but it's unlikely to make an observable difference, since all these factors should add up to only ~20 wasted clock cycles1 for each get
call in most cases - and draw_char
isn't particularly problematic in the profiler right now anyway.
If I were to make another rebase, I'd probably cache the reference to be safe, but through this estimation I don't think it's particularly important.
Footnotes
-
Napkin math via Agner Fog's heuristics. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60 cycles gained for adding a single line seems like a good trade off to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually implemented this and was close to pushing, but then found this bug when checking related code to make sure it's safe.
It doesn't appear that Font.rids
is read-only, but I didn't want to risk ruining a simple PR by introducing a potential regression, so I would like to keep it as-is for now. I plan to make a PR switching Font.rids
to Vector
for 4.5 anyway, and I'll just add the cache there I think.
Thanks! |
UBreakIterator
instances. #102129.I profiled scrolling across 4000
Node
entries in the scene tree.I was able to improve its performance by 20% by avoiding unnecessary copy-on-write calls.
Explanation
Array[i]
has a const and a non-const implementation.When a mutable instance is available, the non-const implementation is always chosen. This version implies a fork of the
CowData
(copy-on-write).Font
has amutable TypedArray<RID> rids
- because it ismutable
, it is always non-const. For this reason, it will always cause forks onoperator[]
access.The
.get
function does not have a non-const implementation. As such, it does not cause forks.Profiler
I profiled the scrolling across 4000 items in the scene tree.
I used Apple Instruments' CPU profiler and inverted the call tree.
Before (after this PR, this entry is completely gone).
Caveats
It would be better to eliminate the non-const
operator[]
entirely, and replace it with aVector.write
-like construct, such thatCoW
must be intentional. This change would avoid this kind of problem in the future.However, that will require quite a lot more work.