-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Percent-encoding of query wrong for ISO-2022-JP #557
Comments
We effectively put the code points into a buffer when parsing them then call the encoder once on the whole buffer when transitioning from query state to fragment state or hitting the "EOF code point". This makes more sense to me than the current spec when using ISO-2022-JP. |
Alternatively, encoding as a stream should fix this, right? |
I think we should probably do both. I think we want to use a buffer so the URL parser doesn't itself have to keep an encoder alive, but then we do need to process things code-point-for-code-point to do error handling correctly. (I could see an argument for letting the URL parser hold onto an encoder though and passing that into percent-encoding and friends rather than an encoding.) I filed whatwg/encoding#235 for Encoding standard cleanup as whatever we do here we can no longer use the encode hook it provides. (Edit: I'm not longer sure about this either as it seems it wouldn't reset ISO-2022-JP encoder state correctly upon encountering an error. As mentioned in the Encoding issue we probably need to adjust the encoder.) |
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Depends on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
If you look at the number of state transitions (hint: count
%1B
instances) in the example below you can see that invoking the encoder code-point-by-code-point does not match implementations.There might be a related thing here, discovered in web-platform-tests/wpt#26158 by @JKingweb, that implementations skip invoking encoders if the input is entirely in the ASCII range. (Again, something that uniquely applies to ISO-2022-JP. Form submission should be checked here as well before deciding on something.)
The text was updated successfully, but these errors were encountered: