-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional unescaping with serde #581
Comments
I'll accept a PR which will add an ability to set up entity resolver for |
@Mingun I'm not really sure how to go about this. I have created a fork that implements a resolver but its not going to be able to access the entities defined in the document because that parsing is handled outside the |
Wow! Your implementation looks promising! I think, it would be better if you create a draft PR so we can discuss on actual implementation in context. You need to capture Lines 2745 to 2765 in 2e9123a
For that change the /// Used to resolve unknown entities while parsing
///
/// # Example
/// Add an example here -- you can adapt existing custom_entities.rs example
pub trait EntityResolver {
/// Called on contents of [`Event::DocType`] to capture declared entities.
/// Can be called multiple times, for each parsed `<!DOCTYPE >` declaration.
fn capture(&mut self, doctype: BytesText);
/// Called when an entity needs to be resolved.
///
/// `None` is returned if a suitable value can not be found.
/// In that case an [`Error::UnrecognizedSymbol`] will be returned.
fn resolve<'entity>(&'entity self, entity: &str) -> Option<&'entity str>;
}
|
I started this a bit to late in the night 😅 It makes a lot more sense now my brain is a bit more functional I have opened a draft PR #583 |
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | patch | `0.28.1` -> `0.28.2` | --- ### Release Notes <details> <summary>tafia/quick-xml</summary> ### [`v0.28.2`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#​0282----2023-04-12) [Compare Source](tafia/quick-xml@v0.28.1...v0.28.2) ##### New Features - [#​581]: Allow `Deserializer` to set `quick_xml::de::EntityResolver` for resolving unknown entities that would otherwise cause the parser to return an \[`EscapeError::UnrecognizedSymbol`] error. ##### Misc Changes - [#​584]: Export `EscapeError` from the crate - [#​581]: Relax requirements for `unsescape_*` set of functions -- their now use `FnMut` instead of `Fn` for `resolve_entity` parameters, like `Iterator::map` from `std`. [#​581]: tafia/quick-xml#581 [#​584]: tafia/quick-xml#584 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS40MS4wIiwidXBkYXRlZEluVmVyIjoiMzUuNDEuMCJ9--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Co-authored-by: crapStone <crapstone@noreply.codeberg.org> Co-authored-by: crapStone <crapstone01@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1862 Reviewed-by: crapStone <crapstone@noreply.codeberg.org> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Otherwise consequent `Text` events (which is possible if their delimited by Comment or PI events, which is skipped) will be merged but not trimmed. That will lead to returning a `Text` event when try to call `deserialize_struct` or `deserialize_map` which will trigger `DeError::ExpectedStart` error. The incorrect trim behavior was introduced in tafia#581, when DocType event began to be processed
Otherwise consequent `Text` events (which is possible if their delimited by Comment or PI events, which is skipped) will be merged but not trimmed. That will lead to returning a `Text` event when try to call `deserialize_struct` or `deserialize_map` which will trigger `DeError::ExpectedStart` error. The incorrect trim behavior was introduced in tafia#581, when DocType event began to be processed
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | minor | `0.28.2` -> `0.29.0` | --- ### Release Notes <details> <summary>tafia/quick-xml</summary> ### [`v0.29.0`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#​0290----2023-06-13) [Compare Source](tafia/quick-xml@v0.28.2...v0.29.0) ##### New Features - [#​601]: Add `serde_helper` module to the crate root with some useful utility functions and document using of enum's unit variants as a text content of element. - [#​606]: Implement indentation for `AsyncWrite` trait implementations. ##### Bug Fixes - [#​603]: Fix a regression from [#​581] that an XML comment or a processing instruction between a <!DOCTYPE> and the root element in the file brokes deserialization of structs by returning `DeError::ExpectedStart` - [#​608]: Return a new error `Error::EmptyDocType` on empty doctype instead of crashing because of a debug assertion. ##### Misc Changes - [#​594]: Add a helper macro to help deserialize internally tagged enums with Serde, which doesn't work out-of-box due to serde limitations. [#​581]: tafia/quick-xml#581 [#​594]: tafia/quick-xml#594 [#​601]: tafia/quick-xml#601 [#​603]: tafia/quick-xml#603 [#​606]: tafia/quick-xml#606 [#​608]: tafia/quick-xml#608 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xMTguMCIsInVwZGF0ZWRJblZlciI6IjM1LjExOC4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCJ9--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Co-authored-by: crapStone <crapstone01@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1940 Reviewed-by: crapStone <crapstone01@gmail.com> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Currently, trying to use
quick_xml::de::from_str
with xml that usesentity
tags in the doctype fail to parse.Could a feature flag or an alternate method be added that disables unescaping and instead always just returns the raw string?
The text was updated successfully, but these errors were encountered: