Skip to content

Commit 3d5ed69

Browse files
committed
Correctly process deserialization of xs:list from empty elements (<tag/> or <tag></tag>)
Document the MapValueDeserializer and SeqItemDeserializer. The deserializers does not yet fully follows their descriptions, but that will be fixed in next commits
1 parent 77cdb0f commit 3d5ed69

File tree

3 files changed

+112
-28
lines changed

3 files changed

+112
-28
lines changed

Changelog.md

+4
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ MSRV bumped to 1.56! Crate now uses Rust 2021 edition.
2020

2121
### Bug Fixes
2222

23+
- [#660]: Fixed incorrect deserialization of `xs:list`s from empty tags (`<tag/>`
24+
or `<tag></tag>`). Previously an `DeError::UnexpectedEof")` was returned in that case
25+
2326
### Misc Changes
2427

2528
- [#643]: Bumped MSRV to 1.56. In practice the previous MSRV was incorrect in many cases.
@@ -37,6 +40,7 @@ MSRV bumped to 1.56! Crate now uses Rust 2021 edition.
3740
[#643]: https://github.com/tafia/quick-xml/pull/643
3841
[#649]: https://github.com/tafia/quick-xml/pull/646
3942
[#651]: https://github.com/tafia/quick-xml/pull/651
43+
[#660]: https://github.com/tafia/quick-xml/pull/660
4044

4145

4246
## 0.30.0 -- 2023-07-23

src/de/map.rs

+102-26
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,52 @@ macro_rules! forward {
373373
/// A deserializer for a value of map or struct. That deserializer slightly
374374
/// differently processes events for a primitive types and sequences than
375375
/// a [`Deserializer`].
376+
///
377+
/// This deserializer can see two kind of events at the start:
378+
/// - [`DeEvent::Text`]
379+
/// - [`DeEvent::Start`]
380+
///
381+
/// which represents two possible variants of items:
382+
/// ```xml
383+
/// <item>A tag item</item>
384+
/// A text item
385+
/// <yet another="tag item"/>
386+
/// ```
387+
///
388+
/// This deserializer are very similar to a [`SeqItemDeserializer`]. The only difference
389+
/// in the `deserialize_seq` method. This deserializer will act as an iterator
390+
/// over tags / text within it's parent tag, whereas the [`SeqItemDeserializer`]
391+
/// will represent sequences as an `xs:list`.
392+
///
393+
/// This deserializer processes items as following:
394+
/// - primitives (numbers, booleans, strings, characters) are deserialized either
395+
/// from a text content, or unwrapped from a one level of a tag. So, `123` and
396+
/// `<int>123</int>` both can be deserialized into an `u32`;
397+
/// - `Option`:
398+
/// - empty text of [`DeEvent::Text`] is deserialized as `None`;
399+
/// - everything else are deserialized as `Some` using the same deserializer,
400+
/// including `<tag/>` or `<tag></tag>`;
401+
/// - units (`()`) and unit structs consumes the whole text or element subtree;
402+
/// - newtype structs are deserialized by forwarding deserialization of inner type
403+
/// with the same deserializer;
404+
/// - sequences, tuples and tuple structs are deserialized by iterating within the
405+
/// parent tag and deserializing each tag or text content using [`SeqItemDeserializer`];
406+
/// - structs and maps are deserialized using new instance of [`MapAccess`];
407+
/// - enums:
408+
/// - in case of [`DeEvent::Text`] event the text content is deserialized as
409+
/// a `$text` variant. Enum content is deserialized from the text using
410+
/// [`SimpleTypeDeserializer`];
411+
/// - in case of [`DeEvent::Start`] event the tag name is deserialized as
412+
/// an enum tag, and the content inside are deserialized as an enum content.
413+
/// Depending on a variant kind deserialization is performed as:
414+
/// - unit variants: consuming text content or a subtree;
415+
/// - newtype variants: forward deserialization to the inner type using
416+
/// this deserializer;
417+
/// - tuple variants: call [`deserialize_tuple`] of this deserializer;
418+
/// - struct variants: call [`deserialize_struct`] of this deserializer.
419+
///
420+
/// [`deserialize_tuple`]: #method.deserialize_tuple
421+
/// [`deserialize_struct`]: #method.deserialize_struct
376422
struct MapValueDeserializer<'de, 'a, 'm, R, E>
377423
where
378424
R: XmlRead<'de>,
@@ -714,7 +760,59 @@ where
714760

715761
////////////////////////////////////////////////////////////////////////////////////////////////////
716762

717-
/// A deserializer for a single item of a sequence.
763+
/// A deserializer for a single item of a mixed sequence of tags and text.
764+
///
765+
/// This deserializer can see two kind of events at the start:
766+
/// - [`DeEvent::Text`]
767+
/// - [`DeEvent::Start`]
768+
///
769+
/// which represents two possible variants of items:
770+
/// ```xml
771+
/// <item>A tag item</item>
772+
/// A text item
773+
/// <yet another="tag item"/>
774+
/// ```
775+
///
776+
/// This deserializer are very similar to a [`MapValueDeserializer`]. The only difference
777+
/// in the `deserialize_seq` method. This deserializer will perform deserialization
778+
/// from the textual content (the text itself in case of [`DeEvent::Text`] event
779+
/// and the text between tags in case of [`DeEvent::Start`] event), whereas
780+
/// the [`MapValueDeserializer`] will iterate over tags / text within it's parent tag.
781+
///
782+
/// This deserializer processes items as following:
783+
/// - primitives (numbers, booleans, strings, characters) are deserialized either
784+
/// from a text content, or unwrapped from a one level of a tag. So, `123` and
785+
/// `<int>123</int>` both can be deserialized into an `u32`;
786+
/// - `Option`:
787+
/// - empty text of [`DeEvent::Text`] is deserialized as `None`;
788+
/// - everything else are deserialized as `Some` using the same deserializer,
789+
/// including `<tag/>` or `<tag></tag>`;
790+
/// - units (`()`) and unit structs consumes the whole text or element subtree;
791+
/// - newtype structs are deserialized as tuple structs with one element;
792+
/// - sequences, tuples and tuple structs are deserialized using [`SimpleTypeDeserializer`]
793+
/// (this is the difference):
794+
/// - in case of [`DeEvent::Text`] event text content passed to the deserializer directly;
795+
/// - in case of [`DeEvent::Start`] event the start and end tags are stripped,
796+
/// and text between them is passed to [`SimpleTypeDeserializer`]. If the tag
797+
/// contains something else other than text, an error is returned, but if it
798+
/// contains a text and something else (for example, `<item>text<tag/></item>`),
799+
/// then the trail is just ignored;
800+
/// - structs and maps are deserialized using new [`MapAccess`];
801+
/// - enums:
802+
/// - in case of [`DeEvent::Text`] event the text content is deserialized as
803+
/// a `$text` variant. Enum content is deserialized from the text using
804+
/// [`SimpleTypeDeserializer`];
805+
/// - in case of [`DeEvent::Start`] event the tag name is deserialized as
806+
/// an enum tag, and the content inside are deserialized as an enum content.
807+
/// Depending on a variant kind deserialization is performed as:
808+
/// - unit variants: consuming text content or a subtree;
809+
/// - newtype variants: forward deserialization to the inner type using
810+
/// this deserializer;
811+
/// - tuple variants: deserialize it as an `xs:list`;
812+
/// - struct variants: call [`deserialize_struct`] of this deserializer.
813+
///
814+
/// [`deserialize_tuple`]: #method.deserialize_tuple
815+
/// [`deserialize_struct`]: #method.deserialize_struct
718816
struct SeqItemDeserializer<'de, 'a, 'm, R, E>
719817
where
720818
R: XmlRead<'de>,
@@ -783,34 +881,12 @@ where
783881
/// ...
784882
/// </>
785883
/// ```
786-
fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, Self::Error>
884+
fn deserialize_seq<V>(mut self, visitor: V) -> Result<V::Value, Self::Error>
787885
where
788886
V: Visitor<'de>,
789887
{
790-
match self.map.de.next()? {
791-
DeEvent::Text(e) => {
792-
SimpleTypeDeserializer::from_text_content(e).deserialize_seq(visitor)
793-
}
794-
// This is a sequence element. We cannot treat it as another flatten
795-
// sequence if type will require `deserialize_seq` We instead forward
796-
// it to `xs:simpleType` implementation
797-
DeEvent::Start(e) => {
798-
let value = match self.map.de.next()? {
799-
DeEvent::Text(e) => {
800-
SimpleTypeDeserializer::from_text_content(e).deserialize_seq(visitor)
801-
}
802-
e => Err(DeError::Unsupported(
803-
format!("unsupported event {:?}", e).into(),
804-
)),
805-
};
806-
// TODO: May be assert that here we expect only matching closing tag?
807-
self.map.de.read_to_end(e.name())?;
808-
value
809-
}
810-
// SAFETY: we use that deserializer only when Start(element) or Text
811-
// event was peeked already
812-
_ => unreachable!(),
813-
}
888+
let text = self.read_string()?;
889+
SimpleTypeDeserializer::from_text(text).deserialize_seq(visitor)
814890
}
815891

816892
#[inline]

src/de/simple_type.rs

+6-2
Original file line numberDiff line numberDiff line change
@@ -495,13 +495,17 @@ pub struct SimpleTypeDeserializer<'de, 'a> {
495495

496496
impl<'de, 'a> SimpleTypeDeserializer<'de, 'a> {
497497
/// Creates a deserializer from a value, that possible borrowed from input
498-
pub fn from_text_content(value: Text<'de>) -> Self {
499-
let content = match value.text {
498+
pub fn from_text(text: Cow<'de, str>) -> Self {
499+
let content = match text {
500500
Cow::Borrowed(slice) => CowRef::Input(slice.as_bytes()),
501501
Cow::Owned(content) => CowRef::Owned(content.into_bytes()),
502502
};
503503
Self::new(content, false, Decoder::utf8())
504504
}
505+
/// Creates a deserializer from a value, that possible borrowed from input
506+
pub fn from_text_content(value: Text<'de>) -> Self {
507+
Self::from_text(value.text)
508+
}
505509

506510
/// Creates a deserializer from a part of value at specified range
507511
#[allow(clippy::ptr_arg)]

0 commit comments

Comments
 (0)