-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-allocation querystring parser #33840
Comments
FYI there was just a refactor here to deal with some of these issues. #32829 |
@SteveSandersonMS thanks for the explanation. I'm super supportive of this change, and I'll defer to @Tratcher for the specific API/implementation concerns, however in general its fine to have these more advanced APIs even if they offer a higher learning curve for customers or have a risk of being used incorrectly. In the majority of cases customers will use the simple "one-off" APIs for convenience and will only reach for these when they need to. |
This is cool. @Tratcher any reason we wouldn't want this in the framework? API review hat:, public readonly ref struct QueryStringEnumerable
{
public QueryStringEnumerable(ReadOnlySpan<char> queryString);
public Enumerator GetEnumerator();
public ref struct Enumerator
{
public QueryStringNameValuePair Current { get; }
public bool MoveNext();
}
} |
This looks good. Can Decode(Name/Value) return a ReadOnlySpan<char>? |
It could, but as things stand, the underlying Maybe in the future the runtime could add an overload of Since in most cases, developers who decode a value want to use it with a string-based API, we'd force them to |
I do recommend using ReadOnlySpan for the Decode APIs, many keys and values won't need decoding and wouldn't need to allocate. It's especially interesting when you're searching for a specific key and need the decoded version so you can do a normalized comparison. UnescapeDataString(string) is an implementation detail we can address. We could also have two overloads; |
OK, thanks for the updates everyone! Is there a specific approval process for an API review item? Based on the apparent consensus here, I plan to just implement this now. Please let me know if not.
I'll do something like that and we can figure out the final naming during API review. |
The API as a whole is interesting. We should also consider some direct integration with QueryString like QueryString.GetEnumerator(). Now that you've shown me this I also want one for PathString that enumerates segments. Would that be useful for routing @javiercn? |
@Tratcher routing already has a faspathtokenizer. See https://github.com/dotnet/aspnetcore/blob/main/src/Http/Routing/src/Matching/FastPathTokenizer.cs |
Make sure to take improvements from the updated parser here: https://github.com/dotnet/aspnetcore/pull/32829/files#diff-d205a0da19dfbe06bc1894ef024ec0bd685fa1770f017ec4b7d30e2a0ced3308R108 |
Thanks for contacting us. We're moving this issue to the |
@SteveSandersonMS could you file a runtime ask for this API? |
Done in #33910 (comment) |
Background and Motivation
Currently the primary API for parsing querystrings in ASP.NET Core is
QueryHelpers.ParseQuery
.Quiz: How many allocations do you get by calling
QueryHelpers.ParseQuery(currentUrl)
?Answer: Obviously it depends on the value of
currentUrl
, but given the default URL length limit of 8KB, we can get over 3200 allocations from this one call. Maybe more; that's just what I got from a quick experiment.If you need to build a key/values dictionary from the querystring, having this many allocations is mostly unavoidable. However, there are cases where you are only interested in extracting one value or a subset of values. In these cases, you (a) don't require a dictionary containing everything, and (b) you don't want to pay the costs of URL decoding irrelevant values.
The particular scenario I'm dealing with now is Blazor's querystring enhancements. For this, we already know statically which querystring parameter names a given component is interested in. Typically it will be < 5 parameters. It's undesirable that, on every navigation (which for Blazor Server just means a single SignalR message saying "location changed"), we'd build a dictionary that may also contain thousands of irrelevant key/value pairs if someone is deliberately trying to stress the system. Originally I was going to solve this purely as an internal implementation detail, but @davidfowl has suggested this might be useful as a public API.
Proposed API
An enumerator that operates over
ReadOnlySpan<char>
, and doesn't attempt to decode the key/value pairs except if you explicitly ask it to.The way this works is simply splitting the existing parsing logic out of
QueryHelpers.ParseQuery
, decoupling it from the decoding-and-building-a-dictionary aspect. Of course we'd also retain the existingQueryHelpers.ParseQuery
. Its implementation would become much simpler as it just has to use this new enumerator to populate aKeyValueAccumulator
.Implementation
Here's an approximate implementation: main...stevesa/component-params-from-querystring#diff-7e7c52bc7413177d1e2dec2dcaecad3bf0769f9bf7af2aea33e4f55956e4172f. It doesn't have the
DecodeName()
/DecodeValue()
methods but they would be pretty trivial.Usage Examples
The exact patterns depend on things like whether you want to collect multiple values or stop on the first match, whether you need to decode the keys to match them, whether you want to be case-sensitive, etc.
Reading from the querystring
Modifying the querystring
If you wanted to get a URL for "current URL except with one extra/modified query param", which will be a common pattern in the Blazor case, then you could build a new URL string by walking the existing URL and just adding/omitting/modifying a single param value. There would be no allocation except for a
StringBuilder
or similar.Alternatives
This could just be an internal implementation detail. Blazor could consume this via shared-source.
We might also want to have a deconstructor on
QueryStringNameValuePair
so that you could doforeach (var (encodedName, encodedValue) in enumerable)
but it's unclear this is really beneficial.Risks
It's possible that people might not understand how
EncodedName
/EncodedValue
differs from the strings they normally receive and could implement buggy logic. However I think this is mostly mitigated by the fact that these are typed asReadOnlySpan<char>
. Less familiar developers will either:QueryHelpers.ParseQuery
, because it's more obvious how to consume a dictionaryDecodeName()
/DecodeValue()
because they want astring
... so in either case they wouldn't see the encoded data.
The text was updated successfully, but these errors were encountered: