Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-allocation querystring parser #33840

Closed
SteveSandersonMS opened this issue Jun 25, 2021 · 14 comments
Closed

Zero-allocation querystring parser #33840

SteveSandersonMS opened this issue Jun 25, 2021 · 14 comments
Assignees
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Done This issue has been fixed
Milestone

Comments

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Jun 25, 2021

Background and Motivation

Currently the primary API for parsing querystrings in ASP.NET Core is QueryHelpers.ParseQuery.

Quiz: How many allocations do you get by calling QueryHelpers.ParseQuery(currentUrl)?
Answer: Obviously it depends on the value of currentUrl, but given the default URL length limit of 8KB, we can get over 3200 allocations from this one call. Maybe more; that's just what I got from a quick experiment.

If you need to build a key/values dictionary from the querystring, having this many allocations is mostly unavoidable. However, there are cases where you are only interested in extracting one value or a subset of values. In these cases, you (a) don't require a dictionary containing everything, and (b) you don't want to pay the costs of URL decoding irrelevant values.

The particular scenario I'm dealing with now is Blazor's querystring enhancements. For this, we already know statically which querystring parameter names a given component is interested in. Typically it will be < 5 parameters. It's undesirable that, on every navigation (which for Blazor Server just means a single SignalR message saying "location changed"), we'd build a dictionary that may also contain thousands of irrelevant key/value pairs if someone is deliberately trying to stress the system. Originally I was going to solve this purely as an internal implementation detail, but @davidfowl has suggested this might be useful as a public API.

Proposed API

An enumerator that operates over ReadOnlySpan<char>, and doesn't attempt to decode the key/value pairs except if you explicitly ask it to.

namespace Microsoft.AspNetCore.WebUtilities
{
    public readonly ref struct QueryStringEnumerable
    {
         public QueryStringEnumerable(ReadOnlySpan<char> queryString);
         public QueryStringEnumerator GetEnumerator();
    }

    public ref struct QueryStringEnumerator
    {
        public QueryStringNameValuePair Current { get; }
        public bool MoveNext();
    }

    public readonly ref struct QueryStringNameValuePair
    {
        public ReadOnlySpan<char> EncodedName { get; }
        public ReadOnlySpan<char> EncodedValue { get; }
        public string DecodeName();
        public string DecodeValue();
    }
}

The way this works is simply splitting the existing parsing logic out of QueryHelpers.ParseQuery, decoupling it from the decoding-and-building-a-dictionary aspect. Of course we'd also retain the existing QueryHelpers.ParseQuery. Its implementation would become much simpler as it just has to use this new enumerator to populate a KeyValueAccumulator.

Implementation

Here's an approximate implementation: main...stevesa/component-params-from-querystring#diff-7e7c52bc7413177d1e2dec2dcaecad3bf0769f9bf7af2aea33e4f55956e4172f. It doesn't have the DecodeName()/DecodeValue() methods but they would be pretty trivial.

Usage Examples

The exact patterns depend on things like whether you want to collect multiple values or stop on the first match, whether you need to decode the keys to match them, whether you want to be case-sensitive, etc.

Reading from the querystring

string? somethingValue = null;
var enumerable = new QueryStringEnumerable(Request.QueryString.Value);
foreach (var pair in enumerable)
{
    // For most keys, we don't need to decode, and most values can be ignored completely
    if (pair.EncodedName.Equals("something", StringComparer.OrdinalIgnoreCase))
    {
        somethingValue = pair.DecodeValue(); // Decoding does allocate, because Uri.UnescapeDataString operates on a string, and you get back a string
        break;
    }
}

Modifying the querystring

If you wanted to get a URL for "current URL except with one extra/modified query param", which will be a common pattern in the Blazor case, then you could build a new URL string by walking the existing URL and just adding/omitting/modifying a single param value. There would be no allocation except for a StringBuilder or similar.

Alternatives

This could just be an internal implementation detail. Blazor could consume this via shared-source.

We might also want to have a deconstructor on QueryStringNameValuePair so that you could do foreach (var (encodedName, encodedValue) in enumerable) but it's unclear this is really beneficial.

Risks

It's possible that people might not understand how EncodedName/EncodedValue differs from the strings they normally receive and could implement buggy logic. However I think this is mostly mitigated by the fact that these are typed as ReadOnlySpan<char>. Less familiar developers will either:

  • Just use QueryHelpers.ParseQuery, because it's more obvious how to consume a dictionary
  • Or, just call DecodeName()/DecodeValue() because they want a string

... so in either case they wouldn't see the encoded data.

@SteveSandersonMS SteveSandersonMS added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 25, 2021
@Tratcher
Copy link
Member

FYI there was just a refactor here to deal with some of these issues. #32829

@javiercn
Copy link
Member

@SteveSandersonMS thanks for the explanation.

I'm super supportive of this change, and I'll defer to @Tratcher for the specific API/implementation concerns, however in general its fine to have these more advanced APIs even if they offer a higher learning curve for customers or have a risk of being used incorrectly.

In the majority of cases customers will use the simple "one-off" APIs for convenience and will only reach for these when they need to.

@pranavkm
Copy link
Contributor

This is cool. @Tratcher any reason we wouldn't want this in the framework?

API review hat:, QueryStringEnumerator could be a nested type:

public readonly ref struct QueryStringEnumerable
{
    public QueryStringEnumerable(ReadOnlySpan<char> queryString);
    public Enumerator GetEnumerator();
    
    public ref struct Enumerator
    {
        public QueryStringNameValuePair Current { get; }
        public bool MoveNext();
    }
}

@davidfowl
Copy link
Member

davidfowl commented Jun 25, 2021

This looks good. Can Decode(Name/Value) return a ReadOnlySpan<char>?

@SteveSandersonMS
Copy link
Member Author

This looks good. Can Decode(Name/Value) return a ReadOnlySpan?

It could, but as things stand, the underlying Uri.UnescapeDataString by takes and returns a string, so we would still allocate (at least one) new string. So as of today it wouldn't help at all to then spanify the result.

Maybe in the future the runtime could add an overload of Uri.UnescapeDataString that takes and returns a ReadOnlySpan<char>. In the case where the input doesn't require any modification to be decoded, it could return back the original input with no allocation. It would only have to allocate if the input does contain encoded characters.

Since in most cases, developers who decode a value want to use it with a string-based API, we'd force them to .ToString() on the result. Would this allocate a further string, or is the runtime somehow smart enough to retain the info that someString.AsSpan().ToString() can reuse the original someString instance?

@Tratcher
Copy link
Member

I do recommend using ReadOnlySpan for the Decode APIs, many keys and values won't need decoding and wouldn't need to allocate. It's especially interesting when you're searching for a specific key and need the decoded version so you can do a normalized comparison. UnescapeDataString(string) is an implementation detail we can address.

We could also have two overloads; string DecodeNameString(); and ReadOnlySpan<char> DecodeNameSpan();

@SteveSandersonMS
Copy link
Member Author

OK, thanks for the updates everyone!

Is there a specific approval process for an API review item? Based on the apparent consensus here, I plan to just implement this now. Please let me know if not.

We could also have two overloads; string DecodeNameString(); and ReadOnlySpan DecodeNameSpan();

I'll do something like that and we can figure out the final naming during API review.

@Tratcher
Copy link
Member

The API as a whole is interesting. We should also consider some direct integration with QueryString like QueryString.GetEnumerator().

Now that you've shown me this I also want one for PathString that enumerates segments. Would that be useful for routing @javiercn?

@javiercn
Copy link
Member

javiercn commented Jun 25, 2021

@Tratcher
Copy link
Member

@BrennanConroy BrennanConroy added this to the Next sprint planning milestone Jun 25, 2021
@ghost
Copy link

ghost commented Jun 25, 2021

Thanks for contacting us.

We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@pranavkm
Copy link
Contributor

It could, but as things stand, the underlying Uri.UnescapeDataString by takes and returns a string, so we would still allocate (at least one) new string. So as of today it wouldn't help at all to then spanify the result.

@SteveSandersonMS could you file a runtime ask for this API?

@SteveSandersonMS
Copy link
Member Author

@pranavkm Filed dotnet/runtime#54828

@SteveSandersonMS
Copy link
Member Author

Done in #33910 (comment)

@SteveSandersonMS SteveSandersonMS added the Done This issue has been fixed label Jun 29, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 29, 2021
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Done This issue has been fixed
Projects
None yet
Development

No branches or pull requests

7 participants