Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Span struct (replacing StrRange). Spans represent read-only access to a contiguous array, resembling std::span. #100293

Merged
merged 1 commit into from
Mar 10, 2025

Conversation

Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Dec 11, 2024

It is currently difficult to run algorithms on String, Vector, LocalVector data ranges or plain C arrays without copying the data first. This leads to inefficiencies.

This PR adds the Span class. A span represents a view into an array (it's a pointer and size).
With Span, it will be easier to implement functions with more agnosticism as to the memory storage, helping to reduce unnecessary copies. Additionally, Span will help bridge the gap between LocalVector and Vector, helping to address godotengine/godot-proposals#5144.

Span is similar to StrView which it replaces, but meant for more than just strings. It is furthermore similar to VectorView which will be replaced in a future PR.

Example

The String.path_to implementation currently has the following lines of code:

godot/core/string/ustring.cpp

Lines 5108 to 5109 in c2e4ae7

Vector<String> src_dirs = src.substr(1, src.length() - 2).split("/");
Vector<String> dst_dirs = dst.substr(1, dst.length() - 2).split("/");

Here, the String is first subselected through substr, causing a copy to be made.
Then, it is split through a split call, through which multiple copies of regions of the string are made.
None of the regions are modified in the process - the copies are made merely because it is not possible to avoid them.

An optimized implementation could look like this:

LocalVector<Span<char32_t>> src_dirs = spans::split(src.span().subspan(1, src.length() - 2), U'/'); 
LocalVector<Span<char32_t>> dst_dirs = spans::split(dst.span().subspan(1, dst.length() - 2), U'/'); 

In this implementation, no copies of the string need to be made, because all Spans used here are views into the original string (spans::split and subspan would need to be added as functions).

Discussion

Span is named after std::span of C++20 std::span.
Span is const-only because a need for mutable spans is not obvious yet. It is easier to use if it is const by default. If mutable spans are needed in the future, it can be proposed then.

@Ivorforce Ivorforce marked this pull request as ready for review December 11, 2024 22:43
@Ivorforce Ivorforce requested a review from a team as a code owner December 11, 2024 22:43
@Ivorforce Ivorforce force-pushed the buffer-view branch 3 times, most recently from e3be56f to b8c9998 Compare December 11, 2024 23:18
@Ivorforce
Copy link
Member Author

Ivorforce commented Dec 12, 2024

Looks like the builds are failing.
The Windows failure is a repeat of a problem I had in #99806 already. It can be fixed by touching the file to wipe caches.
I have no idea what the Linux problem is about though, since tests are completing on my own machine.
Edit: Nevermind, I got it.

@RandomShaper
Copy link
Member

There's a VectorView somewhere in the rendering code (in rendering_device_commons.h, I think) that seems to have the same goal. Also, I barely remember there's a proposal around related to this.

@Ivorforce
Copy link
Member Author

There's a VectorView somewhere in the rendering code (in rendering_device_commons.h, I think) that seems to have the same goal. Also, I barely remember there's a proposal around related to this.

Oh yeah, I see it.
I think it would be best to consolidate that into BufferView once merged, in a follow-up PR.

@Ivorforce
Copy link
Member Author

Ivorforce commented Dec 12, 2024

To avoid the MSVC compiler issue (which I have previously determined to be a cache issue), I am touching feed_effects.h by inserting an explicit include to a well-known file.
I think it may be caused by the only include of the file being a generated file, possibly leading to the cache resolver running into a race condition when compiling the file. I hope this change can fix the problem for good (though we won't know for sure until future PRs since touching the file already invalidates the cache, fixing the issue).

@Ivorforce Ivorforce changed the title Rename StrRange -> BufferView. Move find, rfind and contains functions from CowData to BufferView. Rename StrRange -> Span. Move find, rfind and contains functions from CowData to Span. Dec 12, 2024
@Ivorforce
Copy link
Member Author

Ivorforce commented Dec 12, 2024

I renamed BufferView -> Span since that's what C++20 calls the concept. And I shrunk the implementation to be bare-bones as we can still add the rest of what we need later.

@Ivorforce Ivorforce force-pushed the buffer-view branch 2 times, most recently from 967b48e to 49fda0d Compare December 17, 2024 16:11
Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd encourage adding tests as well. Seeing as you're making a new template, you'd be free to make several functions constexpr for compile-time sanity checks:

TEST_CASE("[Span] Constant Validators") {
	constexpr Span<uint8_t> span_empty;
	static_assert(span_empty.size() == 0);
	static_assert(span_empty.is_empty());

	constexpr uint8_t byte = 0;
	constexpr Span<uint8_t> span_byte = Span<uint8_t>(byte, 1);
	static_assert(span_byte.size() == 1);
	static_assert(!span_byte.is_empty());
}

@Ivorforce
Copy link
Member Author

Ivorforce commented Dec 17, 2024

I'd encourage adding tests as well. Seeing as you're making a new template, you'd be free to make several functions constexpr for compile-time sanity checks:

TEST_CASE("[Span] Constant Validators") {
	constexpr Span<uint8_t> span_empty;
	static_assert(span_empty.size() == 0);
	static_assert(span_empty.is_empty());

	constexpr uint8_t byte = 0;
	constexpr Span<uint8_t> span_byte = Span<uint8_t>(byte, 1);
	static_assert(span_byte.size() == 1);
	static_assert(!span_byte.is_empty());
}

I like this test! I'll add some more as well.
I'm also realizing, i can make every single function currently in Span constexpr. Not just great for the compiler, but it also means we don't need a single runtime test (yet) 😄

@@ -31,6 +31,7 @@
#ifndef SORT_EFFECTS_RD_H
#define SORT_EFFECTS_RD_H

#include "servers/rendering/renderer_rd/shader_rd.h"
Copy link
Member Author

@Ivorforce Ivorforce Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick fix due to build failing because of erroneous cache restores.

I have #100293 (comment) that including a non-generated file explicitly solves such cache issues for good.

Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codestyle checks out & the use of more modern C++ concepts is a welcome addition

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go. But we need to settle the size_t debate. Personally I lean towards just making it a uint64_t and calling it a day.

I appreciate the reasons for size_t, but I think the pain of having a changing size isn't going to be worth the slight optimization for 32 bit systems. If you look at download numbers, almost no one uses the 32 bit version of Godot. And, ultimately, we need to prioritize developer ergonomics in this case. Have a fixed size that we can count on and that matches our equivalent container types is IMO the most important consideration.

CC @lawnjelly @hpvb It would be great for you two to weigh in with your thoughts sometime soon so we can merge this and begin the process of migrating over to Span

@Repiteo
Copy link
Contributor

Repiteo commented Mar 8, 2025

size_t wouldn't change based on bitcount, you're thinking of uintptr_t

@akien-mga
Copy link
Member

akien-mga commented Mar 8, 2025

size_t wouldn't change based on bitcount, you're thinking of uintptr_t

size_t is definitely architecture dependent: https://stackoverflow.com/a/918909

@lawnjelly
Copy link
Member

size_t wouldn't change based on bitcount, you're thinking of uintptr_t

Yes size_t is imo well worth avoiding in the codebase specifically because it is afaik architecture dependent and we are multiplatform. Most programmers have moved to fixed width types these days because they avoid a host of problems with the old style types (can change the size of structures, padding etc, implicit conversions). (Afaik there can be some small exceptions in optimized tight loops, but these are few and far between)

This is the same reason we prefer int32_t over int where possible.

@lawnjelly
Copy link
Member

lawnjelly commented Mar 8, 2025

My thoughts here would be not to use size_t because non-fixed type.
I would argue for templating the size like we do in LocalVector.

Personally I would also default U to uint32_t like in LocalVector (some people like to use signed types for counters, but I won't get into that war here). Span is to be used heavily internally and we want to avoid any penalties on any architectures when converting to 32 bit (if loop counters are 32 bit).

If you default to 64 bit, you then may end up having to do this (or the equivalent cast, either explicit or implicit):

uint32_t count = my_span.size();
for (uint32_t n=0; n<count; n++)
{
...
}

As well as extra code, this now creates a bug on every use case - what happens if the size IS larger than 32 bit? Same problem occurs with a cast. Are you going to write error checking code in every loop? Or are you going to force every loop to 64 bit just in case?

You also potentially have to make all your structures 64 bit just in case, and it can spread in the codebase, like a Nan.

The actual real world cases for 64 bit in loops are few and far between (memory addresses, large videos, files, not a lot else), and imo in those cases it is better to use a 64 bit version explicitly (e.g. with the templated U create a Span64 which can be used as a parameter for functions).

In general I have seen some opinions that 64 bit is free but afaik it isn't - quite apart from any CPU costs it also has the potential to bloat structures and data, on both 32 bit and 64 bit platforms.

(sorry I've read back and realised I was mostly repeating what I said before in earlier comments from a couple months back .. 😁 )
Overall imo:

  • size_t - hard no from me
  • 64 bit default - mistake imo, but less bad than size_t

BTW this is all in my opinion and what I personally would go for, since I was asked for it above. What you guys choose to use it up to you (it likely mostly won't affect me except in terms of forward ports).

@clayjohn
Copy link
Member

clayjohn commented Mar 8, 2025

@lawnjelly When I wrote my comment I was forgetting that LocalVector uses uint32_t. I thought that we had standardized on uint64_t for both LocalVector and Vector. Looking through the codebase it seems that LocalVector is templated and can take another type for its size, but it appears that, in practice, it always uses uint32_t.

The only concern I have for templating is the edge case where spans are parameters to methods that can take either Vectors or LocalVectors. If Span is templated, then those methods need to be templated too, which will cause a lot of pain

@Repiteo
Copy link
Contributor

Repiteo commented Mar 8, 2025

size_t is definitely architecture dependent: https://stackoverflow.com/a/918909

I'll be damned! I always interpreted the specification as equivalent to a max possible size, but maybe I crossed some wires with uintmax_t.

In any case, I'm not totally against size_t. Despite being a fringe type in theory, it remains the type used for virtually all native size operations in C/C++. Granted, I would extend this point to say that we should be using size_t in virtually all of these cases, but that's well beyond the scope of this PR and probably isn't a hill worth dying on.

It's also worth noting that there already was a transition to 64-bit for CowData in #86730, which plenty of container types build on top of. It's implemented in a somewhat archaic manner with Size/USize typedefs, but the core is uint64_t. If we want Span to be "just" 32-bit, it would be worth looking into the problems that were initially solved by the 64-bit conversion. But this might be more of a topic for a broader examination of the different sizes we use across templates, let alone the codebase.

@lawnjelly
Copy link
Member

The only concern I have for templating is the edge case where spans are parameters to methods that can take either Vectors or LocalVectors.

Just trying to guess what you mean here, do you mean if we default to u32 and e.g. Vector is trying to pass into a larger than u32 size to a Span32 (just calling it that here for disambiguation)?

We could just check this on the span constructor that it's within the 32 bit range.

For functions that are likely to need 64 bit input just make them take Span64 as a parameter instead of Span32. Job done. Span64 could be created automatically from Vector or LocalVector just like Span32, but doesn't need the size check on construction.

It is possible on much modern hardware there may be less penalties from 64 bit in terms of CPU so maybe we could get away with it, but bear in mind even the size of counters is doubled, stack size, storage size etc. I come from the school of making use of every byte so it's hard to adjust to, especially with how rarely 64 bit is actually required. 😁

Remember 32 bit is enough to access 4Gb of byte size units.
16Gb of uint32_t units.
48Gb of Vector3(float) units.
96Gb of Vector3(double) units.
Outside of specialist areas like files and video, those kind of sizes are going to run so slow you should be using chunked access anyway. The cache will likely be dying. They aren't going to be super useful for games imo. Which is why I would treat 64 bit as special case.

We presumably also have a lot of existing code that uses 32 bit counters, and if swapping to 64 bit we should probably also consider swapping loops to e.g.:

for (uint64_t n=0; n<span.size(); n++)

otherwise each of these would be an overflow potential, and we should also evaluate whether there are implicit conversion costs when comparing 32 to 64 bit.

So in summary, I personally think 64 bit counters throughout core is madness, but go for it if you're feeling lucky. 😉

User code is another matter. Bound languages are going to be so slow the 32/64 bit makes no odds, so I'm not surprised it was used in bound containers.

@YYF233333
Copy link
Contributor

Some thoughts: I think we could store a size_t value internally in Span while exposing an int64_t interface externally. The advantage of this approach is that Span itself will always take up exactly two machine words without padding. Additionally, the size returned to the caller can be directly compared with int without warnings (which is an issue I often encounter when replacing Vector with LocalVector). This means existing loops using int as an index will continue to compile without modification. While, in theory, arrays larger than INT64_MAX could exist, in practice, no machine has that much memory yet.

@Ivorforce
Copy link
Member Author

Ivorforce commented Mar 9, 2025

Alright, thanks for all the perspectives!
I'll just add a few quick words of my own, then summarize where we are.

For me, uint32_t is a hard no. Practically, it means we'd be declaring now: Godot Vector will never be able to hold more than 4GB of raw buffer data. I think that's a pretty hard sell.

As for @lawnjelly's templates: I think that doesn't really solve any problems, because either 32-bit or 64-bit would have to be the default, and be used for Vector and the likes. Defaulting to 64 bit with an option to use 32 bit instead makes little sense to me. But defaulting to 32 bit data brings the aformentioned problems, and only GDExtension code would be able to make use of larger data (and jump through hoops to do it).

If you default to 64 bit, you then may end up having to do this (or the equivalent cast, either explicit or implicit): [...]

Generally, the speed of uint32_t and uint64_t are practically the same in most situations. I think differences like in looped conversions are unlikely to be a bottleneck. But my main argument here is that there's little reason to iterate with uint32_t in the first place, if we go with 64-bit indices.

As a final note, I'd still prefer size_t, but I'm totally ok with uint64_t, for the reasons @clayjohn mentioned about developer ergonomics.

Summary (for now)

(Edit: reading through this again, it's admittedly biased, but i hope it does the job 😅)

size_t

  • Fast and small on both 32 bit and 64 bit
  • System library equivalent of sizes
  • Discrepancy between 32 bit and 64 bit systems; may need special handling
  • Not currently in use for any containers

uint32_t

  • Limited to 4GB byte arrays (more for other data types)
  • Good on 32 bit (rare), mostly wasteful on 64 bit (common)
  • int64_t must be used for indexing (otherwise 2GB max)
  • LocalVector use this right now (by default)

uint64_t

  • Good on 64 bit (common), wasteful on 32 bit (rare)
  • CowData / Vector / String use this right now

Conclusion

Taking everyone's opinion into account, the least problematic contender seems to be uint64_t, and we can add some sanity checks on 32 bit systems to make sure we don't accidentally try to create spans that don't even fit in memory.

I'll adjust the PR accordingly. Feel free to approve the PR based on this change, or reject if you don't think the discussion has sufficiently covered all grounds (especially looking at @lawnjelly). We can always make the final decision in the next core meeting :)

…ccess to a contiguous array, resembling `std::span`.
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my end!

@Repiteo
Copy link
Contributor

Repiteo commented Mar 9, 2025

I'm open to revisiting size_t in the future, but it should be as a dedicated proposal/PR. Otherwise, my previous approval stands

Copy link
Member

@lawnjelly lawnjelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring my preference for 32 bit, it looks great, but I can see I'm outvoted on that and it's not a hill to die on.

As long as I get bragging rights when you later realise 32 bit was a better choice. 👍 😉

@Repiteo Repiteo merged commit 1901d7d into godotengine:master Mar 10, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Mar 10, 2025

Thanks!

@Ivorforce Ivorforce deleted the buffer-view branch March 10, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants