Zero-allocation querystring parser #33840

SteveSandersonMS · 2021-06-25T11:59:04Z

Background and Motivation

Currently the primary API for parsing querystrings in ASP.NET Core is QueryHelpers.ParseQuery.

Quiz: How many allocations do you get by calling QueryHelpers.ParseQuery(currentUrl)?
Answer: Obviously it depends on the value of currentUrl, but given the default URL length limit of 8KB, we can get over 3200 allocations from this one call. Maybe more; that's just what I got from a quick experiment.

If you need to build a key/values dictionary from the querystring, having this many allocations is mostly unavoidable. However, there are cases where you are only interested in extracting one value or a subset of values. In these cases, you (a) don't require a dictionary containing everything, and (b) you don't want to pay the costs of URL decoding irrelevant values.

The particular scenario I'm dealing with now is Blazor's querystring enhancements. For this, we already know statically which querystring parameter names a given component is interested in. Typically it will be < 5 parameters. It's undesirable that, on every navigation (which for Blazor Server just means a single SignalR message saying "location changed"), we'd build a dictionary that may also contain thousands of irrelevant key/value pairs if someone is deliberately trying to stress the system. Originally I was going to solve this purely as an internal implementation detail, but @davidfowl has suggested this might be useful as a public API.

Proposed API

An enumerator that operates over ReadOnlySpan<char>, and doesn't attempt to decode the key/value pairs except if you explicitly ask it to.

namespace Microsoft.AspNetCore.WebUtilities
{
    public readonly ref struct QueryStringEnumerable
    {
         public QueryStringEnumerable(ReadOnlySpan<char> queryString);
         public QueryStringEnumerator GetEnumerator();
    }

    public ref struct QueryStringEnumerator
    {
        public QueryStringNameValuePair Current { get; }
        public bool MoveNext();
    }

    public readonly ref struct QueryStringNameValuePair
    {
        public ReadOnlySpan<char> EncodedName { get; }
        public ReadOnlySpan<char> EncodedValue { get; }
        public string DecodeName();
        public string DecodeValue();
    }
}

The way this works is simply splitting the existing parsing logic out of QueryHelpers.ParseQuery, decoupling it from the decoding-and-building-a-dictionary aspect. Of course we'd also retain the existing QueryHelpers.ParseQuery. Its implementation would become much simpler as it just has to use this new enumerator to populate a KeyValueAccumulator.

Implementation

Here's an approximate implementation: main...stevesa/component-params-from-querystring#diff-7e7c52bc7413177d1e2dec2dcaecad3bf0769f9bf7af2aea33e4f55956e4172f. It doesn't have the DecodeName()/DecodeValue() methods but they would be pretty trivial.

Usage Examples

The exact patterns depend on things like whether you want to collect multiple values or stop on the first match, whether you need to decode the keys to match them, whether you want to be case-sensitive, etc.

Reading from the querystring

string? somethingValue = null;
var enumerable = new QueryStringEnumerable(Request.QueryString.Value);
foreach (var pair in enumerable)
{
    // For most keys, we don't need to decode, and most values can be ignored completely
    if (pair.EncodedName.Equals("something", StringComparer.OrdinalIgnoreCase))
    {
        somethingValue = pair.DecodeValue(); // Decoding does allocate, because Uri.UnescapeDataString operates on a string, and you get back a string
        break;
    }
}

Modifying the querystring

If you wanted to get a URL for "current URL except with one extra/modified query param", which will be a common pattern in the Blazor case, then you could build a new URL string by walking the existing URL and just adding/omitting/modifying a single param value. There would be no allocation except for a StringBuilder or similar.

Alternatives

This could just be an internal implementation detail. Blazor could consume this via shared-source.

We might also want to have a deconstructor on QueryStringNameValuePair so that you could do foreach (var (encodedName, encodedValue) in enumerable) but it's unclear this is really beneficial.

Risks

It's possible that people might not understand how EncodedName/EncodedValue differs from the strings they normally receive and could implement buggy logic. However I think this is mostly mitigated by the fact that these are typed as ReadOnlySpan<char>. Less familiar developers will either:

Just use QueryHelpers.ParseQuery, because it's more obvious how to consume a dictionary
Or, just call DecodeName()/DecodeValue() because they want a string

... so in either case they wouldn't see the encoded data.

The text was updated successfully, but these errors were encountered:

Tratcher · 2021-06-25T12:34:44Z

FYI there was just a refactor here to deal with some of these issues. #32829

javiercn · 2021-06-25T12:49:19Z

@SteveSandersonMS thanks for the explanation.

I'm super supportive of this change, and I'll defer to @Tratcher for the specific API/implementation concerns, however in general its fine to have these more advanced APIs even if they offer a higher learning curve for customers or have a risk of being used incorrectly.

In the majority of cases customers will use the simple "one-off" APIs for convenience and will only reach for these when they need to.

pranavkm · 2021-06-25T13:22:14Z

This is cool. @Tratcher any reason we wouldn't want this in the framework?

API review hat:, QueryStringEnumerator could be a nested type:

public readonly ref struct QueryStringEnumerable
{
    public QueryStringEnumerable(ReadOnlySpan<char> queryString);
    public Enumerator GetEnumerator();
    
    public ref struct Enumerator
    {
        public QueryStringNameValuePair Current { get; }
        public bool MoveNext();
    }
}

davidfowl · 2021-06-25T15:37:28Z

This looks good. Can Decode(Name/Value) return a ReadOnlySpan<char>?

SteveSandersonMS · 2021-06-25T16:29:40Z

This looks good. Can Decode(Name/Value) return a ReadOnlySpan?

It could, but as things stand, the underlying Uri.UnescapeDataString by takes and returns a string, so we would still allocate (at least one) new string. So as of today it wouldn't help at all to then spanify the result.

Maybe in the future the runtime could add an overload of Uri.UnescapeDataString that takes and returns a ReadOnlySpan<char>. In the case where the input doesn't require any modification to be decoded, it could return back the original input with no allocation. It would only have to allocate if the input does contain encoded characters.

Since in most cases, developers who decode a value want to use it with a string-based API, we'd force them to .ToString() on the result. Would this allocate a further string, or is the runtime somehow smart enough to retain the info that someString.AsSpan().ToString() can reuse the original someString instance?

Tratcher · 2021-06-25T18:03:04Z

I do recommend using ReadOnlySpan for the Decode APIs, many keys and values won't need decoding and wouldn't need to allocate. It's especially interesting when you're searching for a specific key and need the decoded version so you can do a normalized comparison. UnescapeDataString(string) is an implementation detail we can address.

We could also have two overloads; string DecodeNameString(); and ReadOnlySpan<char> DecodeNameSpan();

SteveSandersonMS · 2021-06-25T18:06:53Z

OK, thanks for the updates everyone!

Is there a specific approval process for an API review item? Based on the apparent consensus here, I plan to just implement this now. Please let me know if not.

We could also have two overloads; string DecodeNameString(); and ReadOnlySpan DecodeNameSpan();

I'll do something like that and we can figure out the final naming during API review.

Tratcher · 2021-06-25T18:07:44Z

The API as a whole is interesting. We should also consider some direct integration with QueryString like QueryString.GetEnumerator().

Now that you've shown me this I also want one for PathString that enumerates segments. Would that be useful for routing @javiercn?

javiercn · 2021-06-25T18:09:59Z

@Tratcher routing already has a faspathtokenizer. See https://github.com/dotnet/aspnetcore/blob/main/src/Http/Routing/src/Matching/FastPathTokenizer.cs

Tratcher · 2021-06-25T18:18:50Z

Make sure to take improvements from the updated parser here: https://github.com/dotnet/aspnetcore/pull/32829/files#diff-d205a0da19dfbe06bc1894ef024ec0bd685fa1770f017ec4b7d30e2a0ced3308R108

ghost · 2021-06-25T20:04:36Z

Thanks for contacting us.

We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

pranavkm · 2021-06-26T13:48:39Z

It could, but as things stand, the underlying Uri.UnescapeDataString by takes and returns a string, so we would still allocate (at least one) new string. So as of today it wouldn't help at all to then spanify the result.

@SteveSandersonMS could you file a runtime ask for this API?

SteveSandersonMS · 2021-06-28T13:45:17Z

@pranavkm Filed dotnet/runtime#54828

SteveSandersonMS · 2021-06-29T14:28:08Z

Done in #33910 (comment)

SteveSandersonMS added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 25, 2021

SteveSandersonMS assigned davidfowl, pranavkm and javiercn Jun 25, 2021

Tratcher added the area-runtime label Jun 25, 2021

BrennanConroy added this to the Next sprint planning milestone Jun 25, 2021

This was referenced Jun 28, 2021

QueryStringEnumerable API #33910

Merged

ReadOnlySpan<char> overload of Uri.UnescapeDataString dotnet/runtime#54828

Closed

SteveSandersonMS closed this as completed Jun 29, 2021

SteveSandersonMS added the Done This issue has been fixed label Jun 29, 2021

SteveSandersonMS modified the milestones: Next sprint planning, 6.0-preview6 Jun 29, 2021

guardrex mentioned this issue Jul 20, 2021

Blazor 6.0 content tracking issue dotnet/AspNetCore.Docs#22045

Closed

41 tasks

ghost locked as resolved and limited conversation to collaborators Jul 29, 2021

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-allocation querystring parser #33840

Zero-allocation querystring parser #33840

SteveSandersonMS commented Jun 25, 2021 •

edited

Loading

Tratcher commented Jun 25, 2021

javiercn commented Jun 25, 2021

pranavkm commented Jun 25, 2021

davidfowl commented Jun 25, 2021 •

edited

Loading

SteveSandersonMS commented Jun 25, 2021

Tratcher commented Jun 25, 2021

SteveSandersonMS commented Jun 25, 2021

Tratcher commented Jun 25, 2021

javiercn commented Jun 25, 2021 •

edited

Loading

Tratcher commented Jun 25, 2021

ghost commented Jun 25, 2021

pranavkm commented Jun 26, 2021

SteveSandersonMS commented Jun 28, 2021

SteveSandersonMS commented Jun 29, 2021

Zero-allocation querystring parser #33840

Zero-allocation querystring parser #33840

Comments

SteveSandersonMS commented Jun 25, 2021 • edited Loading

Background and Motivation

Proposed API

Implementation

Usage Examples

Reading from the querystring

Modifying the querystring

Alternatives

Risks

Tratcher commented Jun 25, 2021

javiercn commented Jun 25, 2021

pranavkm commented Jun 25, 2021

davidfowl commented Jun 25, 2021 • edited Loading

SteveSandersonMS commented Jun 25, 2021

Tratcher commented Jun 25, 2021

SteveSandersonMS commented Jun 25, 2021

Tratcher commented Jun 25, 2021

javiercn commented Jun 25, 2021 • edited Loading

Tratcher commented Jun 25, 2021

ghost commented Jun 25, 2021

pranavkm commented Jun 26, 2021

SteveSandersonMS commented Jun 28, 2021

SteveSandersonMS commented Jun 29, 2021

SteveSandersonMS commented Jun 25, 2021 •

edited

Loading

davidfowl commented Jun 25, 2021 •

edited

Loading

javiercn commented Jun 25, 2021 •

edited

Loading