Switching ReadOnlySpan<char> with constant strings #8640
Replies: 51 comments
-
This does seem really annoying. I'm not sure I have a language proposal that would fix this, but it would be something I'd like to address. The obvious route would be something like an implicit conversion on |
Beta Was this translation helpful? Give feedback.
-
Yes it does exist in .NET Core 2.1 (NOT with the portable version!!) If this was implemented, this should work with the portable version too. |
Beta Was this translation helpful? Give feedback.
-
Note: is there an idea on what the codegen here would actually be? The normal string-switch works by building a dictionary and switching on that. That likely wouldn't work for ReadOnlySpans. Perhaps emitting code for an inline-trie might be appropriate? |
Beta Was this translation helpful? Give feedback.
-
Why not? As far as I can tell, it doesn't actually use |
Beta Was this translation helpful? Give feedback.
-
@CyrusNajmabadi @svick |
Beta Was this translation helpful? Give feedback.
-
I'd like for |
Beta Was this translation helpful? Give feedback.
-
Oh terrific. Good to know! It does seem a bit interesting to me that it both has to scan the string to produce the hash, and then has to do the equality check again. It feels like a double pass of the string, when only one would be necessary since you're generating the code here manually. But maybe it would be a bit too much codebloat, or too branchy, when this can be very simple codegen. |
Beta Was this translation helpful? Give feedback.
-
@CyrusNajmabadi the comparison is necessary. you can make a perfect hash function to avoid collisions among your cases, but it's still possible for your input string to not be one of the switch options. We'd need something akin to VC++'s |
Beta Was this translation helpful? Give feedback.
-
@scalablecory I understand the need for the compare if you go the hash route. I was saying that i was suprised that the hash-route was used in the first place, isntead of just codegening something like a trie. |
Beta Was this translation helpful? Give feedback.
-
It states that a trie would be slower than a HashSet when searching entire strings. A trie would be faster if we are using pattern searching, but switches are not for that. |
Beta Was this translation helpful? Give feedback.
-
There would be no insertion operations. The compiler wouldn't be populating a generic trie container, it would be generating the trie-comparisons directly based on the literal strings provided. |
Beta Was this translation helpful? Give feedback.
-
Your link says this:
|
Beta Was this translation helpful? Give feedback.
-
yes. that's what i'm trying to say. Effectivley, right now the compiler has to walk every character of the string, to produce the hash. It then has to switch on the hash, and once done, the runtime has to then walk every character of the string a second time. This is basically the poster case for when you want a trie. Instead of having two loops of the string, you just walk each character of it one at a time, and you either end up exactly in the matching branch, or you fall out into the default branch label. |
Beta Was this translation helpful? Give feedback.
-
@HaloFour @CyrusNajmabadi I am talking about trie vs HashSet, not trie vs List or trie vs SortedList!!! |
Beta Was this translation helpful? Give feedback.
-
I don't see how it's relevant. It's not talkign about what i was talking about. Specifically that's a benchmark of a very specific trie impl against a hashset. Importantly, it's a trie actually built of in-memory objects. So you're incurring all sorts of hits trying to walk that. I'm talking about a trie represented as code. These are entirely different things. It would be like me complaining about the hashing that the compiler does here because of some aspect of a HashSet. They're entirely different beasts, even though they both involve 'hashing'. |
Beta Was this translation helpful? Give feedback.
-
This doesn't make sense to me. Why are UTF8 string literals not constants, according to the language? |
Beta Was this translation helpful? Give feedback.
-
Logically they are constants, but:
|
Beta Was this translation helpful? Give feedback.
-
That feels like circular logic -- utf8 literals can't be constants because they're not defined as constants. Pure language rules. As it stands it doesn't seem like a good justification. There are good reasons to make these constants and I haven't seen any reasons not to. Also, I don't think it has to be the case that all constant values have to be allowed as optional parameter values. That's another language rule that can be changed if implementation restrictions require it. |
Beta Was this translation helpful? Give feedback.
-
And pratically they are not constants: using System;
using System.Runtime.CompilerServices;
ReadOnlySpan<byte> x = "Hallo"u8;
Console.WriteLine(x[0]); //72
ref readonly var b = ref x[0];
ref var bwrite = ref Unsafe.AsRef(in b);
bwrite = 7;
Console.WriteLine(b); //7
Console.WriteLine(bwrite); //7
Console.WriteLine(x[0]); //7 As written in the proposal, those strings / bytes are stored in the |
Beta Was this translation helpful? Give feedback.
-
If you want to go unsafe, regular strings are not "constant" either. var x ="Hallo";
fixed(char* p = x) p[0]='X';
Console.Write("Hallo"); But I think that's besides the point. I understand #1881 (comment) as a call for a feature to make more types viable for a constant (including span or other structs) which will cover utf8 strings as a side effect. That would be still different from making the utf8 string literal itself a constant as that may require special handling - this issue. (There's a few other proposals to allow |
Beta Was this translation helpful? Give feedback.
-
For string it is an implementation detail - they are actually not allocated in ro memory sections.
Maybe this should be added to the spec for UT8-literals, too. |
Beta Was this translation helpful? Give feedback.
-
Even simpler: unsafe {
byte* datas = stackalloc byte[]
{
1,2,3,4,5,
};
ReadOnlySpan<byte> x = new(datas, 5);
Console.WriteLine(datas[0]); //1
Console.WriteLine(x[0]); //1
datas[0] = 7;
Console.WriteLine(datas[0]); //7
Console.WriteLine(x[0]); //7
} ROS isn't a frozen collection, it's just a read only view on some datas. |
Beta Was this translation helpful? Give feedback.
-
@FaustVX You're misunderstanding the proposal. It's not to allow all And yes, I regard implementation restrictions, like what's allowed in metadata, as being questions about the language implementation, not the language definition (the spec). If the use cases we're thinking of (switching) are completely unimplementable -- that's a good reason not to do it in the language. But if there simply need to be some restrictions on where they can go (e.g., not allowed in optional parameters), that seems fine to me. |
Beta Was this translation helpful? Give feedback.
-
it def feels (to me) at least) that there's some space for value to be constant even if the type it is isn't always a constant. char[] chars = { 'a', 'b', 'c' };
ReadOnlySpan<char> span = chars.AsSpan(); However, in a cast where there the value is a literal, we could make the claim it's a constant. So this would be ok: const ReadOnlySpan<byte> ConstantString = "abc"u8; |
Beta Was this translation helpful? Give feedback.
-
Right, to be clear, that's how all constants work. There are no "constant types" in the language, there are only constant values, and it happens to be the case that those values are only of certain types. That is,
Moreover, allowing |
Beta Was this translation helpful? Give feedback.
-
(btw this issue is about switching, but I don't see any reason why we can't allow utf8 string literals in optional parameters either. It would be implemented like DecimalConstantAttribute) |
Beta Was this translation helpful? Give feedback.
-
@agocke Good points. Thanks! I"m good with championing this alongside Julien :) |
Beta Was this translation helpful? Give feedback.
-
One more thing that I thought of. Making this change has some interesting implications for other types. In particular, this would be the only way in the language to produce a constant-expr byte string, and the only way to match against a constant-expr byte string. Depending on which particular byte string you're using, I could see people encoding it in a UTF8 string (assuming that it's valid UTF8) just so they could use the constant-expr functionality. Honestly, it seems like a pretty reasonable usage. I wouldn't try to block this, but instead look it is as potentially another language deficiency -- there's currently no way to produce a constant byte array in the language. I know there's a separate working group looking at new literals for lists and dictionaries. It might be interesting to either explore one of those syntaxes producing a constant expression. Alternatively, it might be possible to make |
Beta Was this translation helpful? Give feedback.
-
Actually, even the workaround that the OP's suggests is not correct. The
We have to use
We do need a better way to compare spans with strings, with lees hoop-jumping |
Beta Was this translation helpful? Give feedback.
-
Are there any other places in the language where To me at least, I imagine this probably comes with its own strings attached but I thought I'd at least toss it out there. I'm definitely interested in hearing about any potential drawbacks to this solution. I'm not an expert in this space. |
Beta Was this translation helpful? Give feedback.
-
Speclet: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/pattern-match-span-of-char-on-string.md
With the recent addition of Spans, we can reduce a lot of allocations when dealing with strings. For example, substrings can be replaced with slices. However, there is one place where using strings is better:
With spans, you are forced to write boilerplate code:
Which is identical to the if-else mess that switches aim to solve.
It would be better if spans can be switched with constant strings.
Special-casing spans as a compiler-known type already have precedent in stackalloc expressions, it can also be done here.
Possible extension to regular Span<char> can also be considered, but it is not as important as ReadOnlySpan<char>. We could also enable this with ReadOnlyMemory<char> and Memory<char>.
More possible extensions: switching (ReadOnly)Span<char> with char, (ReadOnly)Span<T> with T where T is a constant type.
Design meetings
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-02-23.md#pattern-matching-over-spanchar
Beta Was this translation helpful? Give feedback.
All reactions