Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Variant alloc #100861

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

[Draft] Variant alloc #100861

wants to merge 1 commit into from

Conversation

hpvb
Copy link
Member

@hpvb hpvb commented Dec 27, 2024

New allocator for Variant.

Performance improvements measured using TPS demo, start the level and don't touch anything. Random spawn points have been removed.

Before:

Project FPS: 174 (5.74 mspf)
Project FPS: 168 (5.95 mspf)
Project FPS: 171 (5.84 mspf)
Project FPS: 175 (5.71 mspf)
Project FPS: 166 (6.02 mspf)
Project FPS: 167 (5.98 mspf)
Project FPS: 158 (6.32 mspf)
Project FPS: 168 (5.95 mspf)
Project FPS: 163 (6.13 mspf)
Project FPS: 166 (6.02 mspf)
Project FPS: 167 (5.98 mspf)
Project FPS: 167 (5.98 mspf)
Project FPS: 166 (6.02 mspf)
Project FPS: 173 (5.78 mspf)

avg: 5.96

After:

Project FPS: 184 (5.43 mspf)
Project FPS: 185 (5.40 mspf)
Project FPS: 179 (5.58 mspf)
Project FPS: 186 (5.37 mspf)
Project FPS: 190 (5.26 mspf)
Project FPS: 183 (5.46 mspf)
Project FPS: 184 (5.43 mspf)
Project FPS: 179 (5.58 mspf)
Project FPS: 183 (5.46 mspf)
Project FPS: 180 (5.55 mspf)
Project FPS: 176 (5.68 mspf)
Project FPS: 177 (5.64 mspf)
Project FPS: 179 (5.58 mspf)
Project FPS: 185 (5.40 mspf)

avg: 5.49

@hpvb hpvb requested a review from a team as a code owner December 27, 2024 21:29
@hpvb hpvb marked this pull request as draft December 27, 2024 21:30
@hpvb hpvb added topic:core and removed confirmed labels Dec 27, 2024
@hpvb hpvb force-pushed the variant-alloc branch 3 times, most recently from 2a9401c to ceb6dec Compare December 27, 2024 21:49
@hpvb hpvb force-pushed the variant-alloc branch 2 times, most recently from 51884a0 to d4d2ee5 Compare December 27, 2024 22:39
@radiantgurl
Copy link
Contributor

radiantgurl commented Dec 27, 2024

Any idea if this is faster when we're on mimalloc? (compared with and without this)

@Repiteo
Copy link
Contributor

Repiteo commented Dec 28, 2024

core/templates/variant_allocator.h should either be relocated to core/variant/variant_allocator.h or given a more generic name (assuming it could be applied beyond Variant)

@Repiteo Repiteo added this to the 4.x milestone Dec 28, 2024
@hpvb
Copy link
Member Author

hpvb commented Dec 28, 2024

Any idea if this is faster when we're on mimalloc? (compared with and without this)

The main thing this allocator does is reduce the amount of time spent under lock. PagedAllocator is protected by a global lock, VariantAllocator has thread local locks, and only needs a global lock roughly every 64 allocations.

@hpvb
Copy link
Member Author

hpvb commented Dec 28, 2024

core/templates/variant_allocator.h should either be relocated to core/variant/variant_allocator.h or given a more generic name (assuming it could be applied beyond Variant)

This is still a draft, I am using this PR to make sure that it passes all the tests, particularly on MSVC. The plan is to replace paged_allocator with this entirely. But I have to still think about how to do the non-threadsafe version of it. I might end up splitting it in two, making an even simpler one for the thread-safe version.

@MewPurPur
Copy link
Contributor

The plan is to replace paged_allocator with this entirely

I should ask, are you aware of #97016?

@hpvb
Copy link
Member Author

hpvb commented Dec 28, 2024

The plan is to replace paged_allocator with this entirely

I should ask, are you aware of #97016?

I was not aware of it, but that one also has a global lock, so I don't think it'll be faster than this.

EDIT: The other PRs goals and use is a lot different in general. I don't really think it is worth comparing the two.

Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a draft, I am using this PR to make sure that it passes all the tests, particularly on MSVC. The plan is to replace paged_allocator with this entirely. But I have to still think about how to do the non-threadsafe version of it. I might end up splitting it in two, making an even simpler one for the thread-safe version.

As in, a drop-in replacement for PagedAllocator? If so, that'll take care of the name concern entirely; but if not, we can just cross that bridge once this is out of the draft stage. Got a few codestyle tweaks in the meantime

Comment on lines 47 to 51
#if (__has_builtin(__builtin_popcountll))
#define builtin_popcountll
#endif
#if (__has_builtin(__builtin_ffsll))
#define builtin_ffsll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't know for certain if uint64_t is unsigned long long, I'd rather use the generic equivalents:

Suggested change
#if (__has_builtin(__builtin_popcountll))
#define builtin_popcountll
#endif
#if (__has_builtin(__builtin_ffsll))
#define builtin_ffsll
#if __has_builtin(__builtin_popcountg)
#define builtin_popcountg
#endif
#if __has_builtin(__builtin_ffsg)
#define builtin_ffsg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang doesn't have __builtin_ffsg or at least the version I use doesn't. I have added some static assertions to ensure the types are equivalent.

#if defined(__GNUC__) || defined(__clang__)
#if (__has_builtin(__builtin_popcountll))
#define builtin_popcountll
static_assert(sizeof(unsigned long long) == sizeof(uint64_t), "uint64_t and unsigned long long must have the same size");
static_assert((uint64_t)(-1) == (unsigned long long)(-1), "uint64_t and unsigned long long must have the same representation");
#endif
#if (__has_builtin(__builtin_ffsll))
#define builtin_ffsll
static_assert(sizeof(long long) == sizeof(int64_t), "int64_t and long long must have the same size");
static_assert((int64_t)(-1) == (long long)(-1), "int64_t and long long must have the same representation");
#endif
#endif

Does this work for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but this would entirely lockout those with unsigned long as their uint64_t. If we don't have a generic, then __builtin_popcountll should do the trick; I was just concerned of unnecessary conversion on those platforms (or maybe that's optimized out, I'm ignorant on compiler magic)

Alternatively, if we still want to cover all of our bases, we could try this:

#if defined(__GNUC__) || defined(__clang__)
#if std::is_same_v<unsigned long long, uint64_t>
#if __has_builtin(__builtin_popcountll)
#define builtin_popcount_uint64 __builtin_popcountll
#endif // __has_builtin(__builtin_ffsll)
#if __has_builtin(__builtin_ffsll)
#define builtin_ffs_uint64 __builtin_ffsll
#endif // __has_builtin(__builtin_ffsll)
#else
#if __has_builtin(__builtin_popcountl)
#define builtin_popcount_uint64 __builtin_popcountl
#endif // __has_builtin(__builtin_ffsl)
#if __has_builtin(__builtin_ffsl)
#define builtin_ffs_uint64 __builtin_ffsl
#endif // __has_builtin(__builtin_ffsl)
#endif // std::is_same_v<unsigned long long, uint64_t>
#endif // defined(__GNUC__) || defined(__clang__)

…Though this is really excessive & probably not necessary.

@hpvb
Copy link
Member Author

hpvb commented Dec 28, 2024

This is still a draft, I am using this PR to make sure that it passes all the tests, particularly on MSVC. The plan is to replace paged_allocator with this entirely. But I have to still think about how to do the non-threadsafe version of it. I might end up splitting it in two, making an even simpler one for the thread-safe version.

As in, a drop-in replacement for PagedAllocator? If so, that'll take care of the name concern entirely; but if not, we can just cross that bridge once this is out of the draft stage. Got a few codestyle tweaks in the meantime

Yeah, this will be a drop-in replacement for PagedAllocator, at least for the thread_safe=true version of it.

@Nazarwadim
Copy link
Contributor

The plan is to replace paged_allocator with this entirely

I should ask, are you aware of #97016?

I was not aware of it, but that one also has a global lock, so I don't think it'll be faster than this.

EDIT: The other PRs goals and use is a lot different in general. I don't really think it is worth comparing the two.

This can actually be done using StaticBlockAllocator and thread_local logic.

I will try to do this. If everything will work, then I will make PR.

@hpvb
Copy link
Member Author

hpvb commented Dec 31, 2024

The plan is to replace paged_allocator with this entirely

I should ask, are you aware of #97016?

I was not aware of it, but that one also has a global lock, so I don't think it'll be faster than this.
EDIT: The other PRs goals and use is a lot different in general. I don't really think it is worth comparing the two.

This can actually be done using StaticBlockAllocator and thread_local logic.

I will try to do this. If everything will work, then I will make PR.

The block allocator still has different characteristics than this one though. The slab allocator of this PR is very fast, it spends almost no time under lock.

I don't think an allocator intended for more generic use is going to beat this one, especially given Godot's allocation patterns. The vast majority of allocation happens on the main thread but deallocation happens all over the place. A benchmark needs to actually have Godot's allocation/deallocation patterns.

A synthetic benchmark that tries to stress the allocator by massively allocating on multiple threads isn't going to scale to Godot.

@Nazarwadim
Copy link
Contributor

The slab allocator of this PR is very fast, it spends almost no time under lock.

Under lock does not play any role in the performance of multithreading. The bottleneck is the SpinLock itself and the allocator data, since they are shared data between threads.

In StaticBlockAllocator, I made a test to check the number of thread locks during allocation/free (I tested in https://github.com/godotengine/godot-demo-projects/tree/master/3d/voxel and tps-demo).
For testing, I created a custom SpinLock for profile locks.

Test results

Voxel:
Where there is number, this is the chunk generation time in us.

Locked Count: 1  thread id using shared resource: 1 waiting thread id: 1
Locked Count: 2  thread id using shared resource: 21 waiting thread id: 1
Locked Count: 3  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 4  thread id using shared resource: 4 waiting thread id: 1
3499
Locked Count: 5  thread id using shared resource: 4 waiting thread id: 1
2641
Locked Count: 6  thread id using shared resource: 1 waiting thread id: 4
3200
Locked Count: 7  thread id using shared resource: 4 waiting thread id: 1
2553
Locked Count: 8  thread id using shared resource: 1 waiting thread id: 4
3068
Locked Count: 9  thread id using shared resource: 4 waiting thread id: 1
2507
2593
Locked Count: 10  thread id using shared resource: 4 waiting thread id: 1
2409
2642
2595
Locked Count: 11  thread id using shared resource: 4 waiting thread id: 1
2875
Locked Count: 12  thread id using shared resource: 1 waiting thread id: 4
2763
Locked Count: 13  thread id using shared resource: 15 waiting thread id: 1
2332
Locked Count: 14  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 15  thread id using shared resource: 12 waiting thread id: 4
Locked Count: 16  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 17  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 18  thread id using shared resource: 1 waiting thread id: 4
2568
2769
2976
Locked Count: 19  thread id using shared resource: 5 waiting thread id: 6
Locked Count: 20  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 21  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 22  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 23  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 24  thread id using shared resource: 4 waiting thread id: 1
1834
Locked Count: 25  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 26  thread id using shared resource: 9 waiting thread id: 1
Locked Count: 27  thread id using shared resource: 4 waiting thread id: 1
2058
2570
Locked Count: 28  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 29  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 30  thread id using shared resource: 4 waiting thread id: 1
3142
2111
Locked Count: 31  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 32  thread id using shared resource: 4 waiting thread id: 1
3016
Locked Count: 33  thread id using shared resource: 4 waiting thread id: 1
3191
Locked Count: 34  thread id using shared resource: 12 waiting thread id: 4
Locked Count: 35  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 36  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 37  thread id using shared resource: 4 waiting thread id: 1
2124
Locked Count: 38  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 39  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 40  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 41  thread id using shared resource: 6 waiting thread id: 5
2562
Locked Count: 42  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 43  thread id using shared resource: 4 waiting thread id: 1
2793
2751
Locked Count: 44  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 45  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 46  thread id using shared resource: 13 waiting thread id: 4
Locked Count: 47  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 48  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 49  thread id using shared resource: 1 waiting thread id: 4
2003
Locked Count: 50  thread id using shared resource: 1 waiting thread id: 4
2764
Locked Count: 51  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 52  thread id using shared resource: 4 waiting thread id: 1
2855
Locked Count: 53  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 54  thread id using shared resource: 1 waiting thread id: 4
2618
Locked Count: 55  thread id using shared resource: 5 waiting thread id: 6
3058
Locked Count: 56  thread id using shared resource: 6 waiting thread id: 1
2396
Locked Count: 57  thread id using shared resource: 4 waiting thread id: 1
2252
Locked Count: 58  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 59  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 60  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 61  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 62  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 63  thread id using shared resource: 5 waiting thread id: 1
Locked Count: 64  thread id using shared resource: 1 waiting thread id: 4
2856
2463
Locked Count: 65  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 66  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 67  thread id using shared resource: 5 waiting thread id: 4
2266

tps-demo:

Locked Count: 1  thread id using shared resource: 11 waiting thread id: 10
Locked Count: 2  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 3  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 4  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 5  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 6  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 7  thread id using shared resource: 1 waiting thread id: 6
Locked Count: 8  thread id using shared resource: 7 waiting thread id: 1
Locked Count: 9  thread id using shared resource: 1 waiting thread id: 15
Locked Count: 10  thread id using shared resource: 6 waiting thread id: 1
Locked Count: 11  thread id using shared resource: 5 waiting thread id: 1
Locked Count: 12  thread id using shared resource: 12 waiting thread id: 5
Locked Count: 13  thread id using shared resource: 14 waiting thread id: 1
Locked Count: 14  thread id using shared resource: 8 waiting thread id: 1
Locked Count: 15  thread id using shared resource: 8 waiting thread id: 1
Locked Count: 16  thread id using shared resource: 5 waiting thread id: 1
Locked Count: 17  thread id using shared resource: 13 waiting thread id: 1
Locked Count: 18  thread id using shared resource: 1 waiting thread id: 11
Locked Count: 19  thread id using shared resource: 10 waiting thread id: 12
Locked Count: 20  thread id using shared resource: 1 waiting thread id: 11
Locked Count: 21  thread id using shared resource: 4 waiting thread id: 1
Locked Count: 22  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 23  thread id using shared resource: 1 waiting thread id: 11
Locked Count: 24  thread id using shared resource: 1 waiting thread id: 11
Locked Count: 25  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 26  thread id using shared resource: 1 waiting thread id: 11
Locked Count: 27  thread id using shared resource: 14 waiting thread id: 1
Locked Count: 28  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 29  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 30  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 31  thread id using shared resource: 14 waiting thread id: 1
Locked Count: 32  thread id using shared resource: 14 waiting thread id: 1
Locked Count: 33  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 34  thread id using shared resource: 14 waiting thread id: 11
Locked Count: 35  thread id using shared resource: 9 waiting thread id: 14
Locked Count: 36  thread id using shared resource: 12 waiting thread id: 1
Locked Count: 37  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 38  thread id using shared resource: 12 waiting thread id: 14
Locked Count: 39  thread id using shared resource: 1 waiting thread id: 12
Locked Count: 40  thread id using shared resource: 12 waiting thread id: 1
Locked Count: 41  thread id using shared resource: 1 waiting thread id: 12
Locked Count: 42  thread id using shared resource: 14 waiting thread id: 12
Locked Count: 43  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 44  thread id using shared resource: 1 waiting thread id: 12
Locked Count: 45  thread id using shared resource: 14 waiting thread id: 1
Locked Count: 46  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 47  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 48  thread id using shared resource: 1 waiting thread id: 14
Locked Count: 49  thread id using shared resource: 4 waiting thread id: 14
Locked Count: 50  thread id using shared resource: 14 waiting thread id: 1
Locked Count: 51  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 52  thread id using shared resource: 1 waiting thread id: 4
Locked Count: 53  thread id using shared resource: 4 waiting thread id: 1

Custom SpinLock code:

class SpinLock2 {
	mutable std::atomic<bool> locked = ATOMIC_VAR_INIT(false);

public:
	_ALWAYS_INLINE_ void lock() const {
		static int thread_id_in_use;
		while (true) {
			bool expected = false;
			if (locked.compare_exchange_weak(expected, true, std::memory_order_acquire, std::memory_order_relaxed)) {
				break;
			}
			do {
				static int count = 0;
				count++;
				print_line("Locked Count:", count, " thread id using shared resource:", thread_id_in_use, "waiting thread id:", Thread::get_caller_id());
				_cpu_pause();
			} while (locked.load(std::memory_order_relaxed));
		}
		thread_id_in_use = Thread::get_caller_id();
	}

	_ALWAYS_INLINE_ void unlock() const {
		locked.store(false, std::memory_order_release);
	}
};

The probability that SpinLock will block the thread is very unlikely (0-10 for generating one chunk, or 53 per minute of gameplay in tps-demo, most of which is during loading). Because the thread only blocks if we simultaneously allocate data in thread A and release data in thread B that was allocated in A.

I haven't fully looked into the SlabAllocator code yet, so I won't say yet whether it is faster in normal allocations or not.
But I noticed that Item has an extra 1-8 bytes of metadata for each element, while BlockAllocator doesn't, which makes it less cache friendly.

@hpvb
Copy link
Member Author

hpvb commented Jan 2, 2025

I haven't fully looked into the SlabAllocator code yet, so I won't say yet whether it is faster in normal allocations or not. But I noticed that Item has an extra 1-8 bytes of metadata for each element, while BlockAllocator doesn't, which makes it less cache friendly.

In almost all cases, this is in the padding between elements anyway, except when the items are already aligned. But this allocator was mode for Variant specifically where this is never true. The index is free in all cases where the allocator is used, and it means that there are no hashes, no lookups of any kind. Everything is just pointer arithmetic of stuff already in cache.

It might indeed be slightly less friendly if the items already happen to be aligned and then it could potentially waste some space, but since it is directly after the item I don't think it is not cache friendly.

The bitmap itself is also aligned such that it should land on a different cache line than the main structure. It really is quite fast :)

@hpvb
Copy link
Member Author

hpvb commented Jan 2, 2025

In StaticBlockAllocator, I made a test to check the number of thread locks during allocation/free (I tested in https://github.com/godotengine/godot-demo-projects/tree/master/3d/voxel and tps-demo). For testing, I created a custom SpinLock for profile locks.

The problem with this approach is that doing extra work in the spinlock invalidates the results. We need to actually benchmark the results in game. The odds of a spinlock locking are changed dramatically by how much work is being done in it. It's a bit of a Heisenberg problem 😄

If your allocator achieves better frame rates than mine does then it is better! I'm not saying it is not better, it very well might be! However I don't think your instrumented spinlock proves whether or not it is better.

@Nazarwadim
Copy link
Contributor

Well, in general, I wanted to note that blocking is not frequent. Even if there are 2-3 times more locks, it is not significant.

@hpvb hpvb force-pushed the variant-alloc branch 8 times, most recently from 8866162 to 404c6e0 Compare January 3, 2025 02:13
New allocator for Variant.
@Nazarwadim
Copy link
Contributor

In almost all cases, this is in the padding between elements anyway, except when the items are already aligned. But this allocator was mode for Variant specifically where this is never true.

Structures and classes cannot use padding between elements. In order to prove it to you, you can write:

sizeof(ThreadSafeSlabAllocator<Variant::Pools::BucketSmall>::Item);

This returns 28 bytes while BucketSmall itself 24. The same will be true for BucketMedium and BucketLarge.

sizeof(ThreadSafeSlabAllocator<Variant>::Item);

And this returns 32, not 24.

The index is free in all cases where the allocator is used, and it means that there are no hashes, no lookups of any kind. Everything is just pointer arithmetic of stuff already in cache.

HashMap lookup is called only once for each thread. Using the id (which is already cached) we find our allocator in the allocator array, which is the same pointer arithmetic operation.

If your allocator achieves better frame rates than mine does then it is better!

Now, since you asked to for benchmarks. I did the same tests as you in this PR, merged your PR with #97016 and compared these branches:
https://github.com/Nazarwadim/godot-fork/tree/slab_allocator
https://github.com/Nazarwadim/godot-fork/tree/use_static_block_allocator_for_variant_alloc

benchmarks.ods

You can see that my allocator has a higher FRS.

In general, what I want to say. You wrote in rocket chat that you don't want to have a discussion with me and you'd better stop working on this. And I say your allocator is great, but you're just using the allocator in the wrong place because we can store the allocator metadata directly in the Variant.
There are places where I would use your allocator:

I hope you agree with me)

@hpvb
Copy link
Member Author

hpvb commented Jan 3, 2025

In general, what I want to say. You wrote in rocket chat that you don't want to have a discussion with me and you'd better stop working on this.

I'll address the rest of your post later, but I'd like to clarify that I really didn't mean to suggest I didn't want to talk with you. If you ping me on rocket.chat I'd be happy to explain what I meant. I'm very sorry if I gave you the impression that I just didn't want to discuss things with you. That is really not what I wanted to say.

This returns 28 bytes while BucketSmall itself 24. The same will be true for BucketMedium and BucketLarge.

You are right, I don't know what I was thinking. I thought I had measured this but apparently not. Or I tested something completely unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants