Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Font calculations by avoiding unnecessary copy-on-write. #102132

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Jan 28, 2025

I profiled scrolling across 4000 Node entries in the scene tree.
I was able to improve its performance by 20% by avoiding unnecessary copy-on-write calls.

Explanation

Array[i] has a const and a non-const implementation.
When a mutable instance is available, the non-const implementation is always chosen. This version implies a fork of the CowData (copy-on-write).
Font has a mutable TypedArray<RID> rids - because it is mutable, it is always non-const. For this reason, it will always cause forks on operator[] access.
The .get function does not have a non-const implementation. As such, it does not cause forks.

Profiler

I profiled the scrolling across 4000 items in the scene tree.
I used Apple Instruments' CPU profiler and inverted the call tree.

Before (after this PR, this entry is completely gone).

1.50 Gc  22,8 %	  CowData<Variant>::_copy_on_write()
1.48 Gc  22,4 %	   CowData<Variant>::ptrw()
1.48 Gc  22,4 %	    VectorWriteProxy<Variant>::operator[](long long)
1.48 Gc  22,4 %	     Array::operator[](int)
1.48 Gc  22,4 %	      Font::get_height(int) const
1.48 Gc  22,4 %	       Tree::compute_item_height(TreeItem*) const
917.26 Mc  13,9 %	        Tree::get_item_height(TreeItem*) const
917.26 Mc  13,9 %	         Tree::get_item_height(TreeItem*) const
917.26 Mc  13,9 %	          Tree::get_internal_min_size() const
917.26 Mc  13,9 %	           Tree::update_scrollbars()

Caveats

It would be better to eliminate the non-const operator[] entirely, and replace it with a Vector.write-like construct, such that CoW must be intentional. This change would avoid this kind of problem in the future.
However, that will require quite a lot more work.

Copy link
Contributor

@kiroxas kiroxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch !

@fire
Copy link
Member

fire commented Jan 28, 2025

I believe that CoW forced on and the edge cases on a case-by-case issue were historical improvements. https://godotengine.org/article/why-we-broke-your-pr/ So I don't suggest flipping it back.

@Ivorforce
Copy link
Member Author

Ivorforce commented Jan 28, 2025

My suggestion would be to double down and do the same to Array as was done to Vector, for the same reason outlined by your linked article :) In other words, the opposite of flipping it back!
As mentioned however that would be a bigger change requiring touching more code.

@bruvzg
Copy link
Member

bruvzg commented Jan 29, 2025

This change should be completely safe, so might be worth moving it to 4.4 milestone.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected.

Scrolling the Scene tree dock with 16385 nodes in it feels noticeably smoother, especially near the bottom (tested on an Linux optimized editor build with LTO).

@clayjohn clayjohn modified the milestones: 4.x, 4.4 Feb 6, 2025
Comment on lines +493 to +496
if (TS->font_has_char(rids.get(i), p_char)) {
int32_t glyph = TS->font_get_glyph_index(rids.get(i), p_font_size, p_char, 0);
TS->font_draw_glyph(rids.get(i), p_canvas_item, p_font_size, p_pos, glyph, p_modulate);
return TS->font_get_glyph_advance(rids.get(i), p_font_size, glyph).x;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we gain anything from extracting rids.get(i) to a const variable, or will the compiler optimize it all the same?

Copy link
Member Author

@Ivorforce Ivorforce Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!
Array.get has the overhead of a function call, an if statement that is likely to be predicted correctly (read_only), and another if statement that is likely to be predicted correctly (bad index). If the array is large, there will also be an unnecessary cache contention (through repeated access of size() and dereferencing ArrayPrivate).
All in all, it could be optimized, but it's unlikely to make an observable difference, since all these factors should add up to only ~20 wasted clock cycles1 for each get call in most cases - and draw_char isn't particularly problematic in the profiler right now anyway.

If I were to make another rebase, I'd probably cache the reference to be safe, but through this estimation I don't think it's particularly important.

Footnotes

  1. Napkin math via Agner Fog's heuristics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 cycles gained for adding a single line seems like a good trade off to me :)

Copy link
Member Author

@Ivorforce Ivorforce Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually implemented this and was close to pushing, but then found this bug when checking related code to make sure it's safe.
It doesn't appear that Font.rids is read-only, but I didn't want to risk ruining a simple PR by introducing a potential regression, so I would like to keep it as-is for now. I plan to make a PR switching Font.rids to Vector for 4.5 anyway, and I'll just add the cache there I think.

@Repiteo Repiteo merged commit 568d628 into godotengine:master Feb 7, 2025
19 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Feb 7, 2025

Thanks!

@Ivorforce Ivorforce deleted the optimize-font-internal-cow branch February 7, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants