Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for mapping from XML to Rust used by deserializer #369

Closed
wants to merge 5 commits into from

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Mar 12, 2022

This is the my vision of further evolution of the serde integration in this crate. Some parts of this is discussible or maybe even impossible to implement -- this is the first iteration of what I would like to see. For now I'm making this draft PR to:

  • share my vision
  • invitation to discussion
  • creating a roadmap for necessary fixes (marked with FIXME in doctests)

You can get a rustdoc documentation by running

cargo doc --features serialize --open

in the crate root an navigate to quick_xml::de module.

Below the (approximately) rendered version of this proposal:

Mapping XML to Rust types

Type names are never considered when deserializing, so you could name your
types as you wish. Other general rules:

  • struct field name could be represented in XML only as attribute name or
    element name.
  • enum variant name could be represented in XML only as attribute name or
    element name.
  • the unit struct, unit type () and unit enum variant can be deserialized
    from any valid XML content:
    • attribute and element names
    • attribute and element values
    • text or CDATA content
  • when deserializing attribute names have precedence over element names.
    So if your XML have both attribute and element named equally, the Rust
    field/variant will be deserialized from the attribute.

NOTE: examples, marked with FIXME: do not work yet -- any PRs that fixes
that are welcome! The message after marker is a test failure message.
Also, all that tests are marked with an ignore option, although their
compiles. This is by intention, because rustdoc marks such blocks with
an exclamation mark unlike no_run blocks.

To parse all these XML's......use that Rust type
Root tag name do not matter
<any-tag one="..." two="..."/>
<any-tag>
  <one>...</one>
  <two>...</two>
</any-tag>
<any-tag one="...">
  <two>...</two>
</any-tag>

NOTE: such XML's are NOT supported because deserializer will always
report a duplicated field error:

<any-tag field="...">
  <field>...</field>
</any-tag>

All these struct can be used to deserialize from specified XML depending on
amount of information that you want to get:

// Get both elements/attributes
struct AnyName {
  one: T,
  two: U,
}
// Get only one element/attribute, ignore other
struct AnyName {
  one: T,
}
// Ignore all attributes/elements
// You can also use the `()` type (unit type)
struct AnyName;

A structure where each XML attribute or child element mapped to the field.
Each attribute or element name becomes a name of field. Name of the struct
itself does not matter.

NOTE: XML allowing you to have an attribute and an element with the
same name inside the one element. Such XML's can't be deserialized because
serde does not allow you to pass custom properties to the fields and we
cannot tell the field on the Rust side, should it be deserialized from the
attribute or from the element

An optional XML attributes/elements that you want to capture. The root tag name do not matter.
<any-tag optional="..."/>
<any-tag/>
  <optional>...</optional>
</any-tag>
<any-tag/>

A structure with an optional field.

struct AnyName {
  optional: Option<T>,
}

When the XML attribute or element is present, type T will be deserialized
from an attribute value (which is a string) or an element (which is a string
or a multi-mapping -- i.e. mapping which can have duplicated keys).

Text content, CDATA content

Text content and CDATA mapped to any Rust type that could be deserialized
from a string, for example, String, &str and so on.

NOTE: deserialization to non-owned types (i.e. borrow from the input),
such as &str, is possible only if you parse document in the UTF-8
encoding and text content do not contains escape sequences.

An XML with different root tag names.
<one field1="...">...</one>
<two field2="...">...</two>
<one>
  <field1>...</field1>
</one>
<two>
  <field2>...</field2>
</two>

An enum where each variant have a name of the root tag. Name of the enum
itself does not matter.

All these types can be used to deserialize from specified XML depending on
amount of information that you want to get:

#[serde(rename_all = "snake_case")]
enum AnyName {
  One { field1: T },
  Two { field2: U },
}
type OtherType = ...;
#[serde(rename_all = "snake_case")]
enum AnyName {
  // `field1` contend discarded
  One,
  // OtherType deserialized from the `field2` content
  Two(OtherType),
}
#[serde(rename_all = "snake_case")]
enum AnyName {
  One,
  // the <two> will be mapped to this
  #[serde(other)]
  Other,
}

You should have variants for all possible tag names in your enum or have
an #[serde(other)] variant.

<xs:choice> inside of the other element.

<any-tag field="...">
  <one>...</one>
</any-tag>
<any-tag field="...">
  <two>...</two>
</any-tag>
<any-tag>
  <field>...</field>
  <one>...</one>
</any-tag>
<any-tag>
  <two>...</two>
  <field>...</field>
</any-tag>
Names of the enum, struct, and struct field does not matter.
// FIXME: Custom("missing field `$flatten`")
#[serde(rename_all = "snake_case")]
enum Choice {
  One,
  Two,
}
struct AnyName {
  field: ...,

  // Creates problems while deserializing inner
  // types in many cases due to
  // https://github.com/serde-rs/serde/issues/1183
  // #[serde(flatten)]
  /// Field name is ignored if it is renamed to
  /// `$flatten`
  #[serde(rename = "$flatten")]
  any_name: Choice,
}

Due to selected workaround you can have only one flatten field
in your structure. That will be checked at the compile time by the
serde derive macro.

A sequence with a strict order, probably with a mixed content (text and tags).
<one>...</one>
text
<![CDATA[cdata]]>
<two>...</two>
<one>...</one>

All elements mapped to the heterogeneous sequential type: tuple or named tuple.
Each element of the tuple should be able to be deserialized from the nested
element content (...), except the enum types which would be deserialized
from the full element (<one>...</one>), so they could use the element name to
choose the right variant:

// FIXME: Custom("invalid length 3, expected tuple
//                struct AnyName with 5 elements")
type One = ...;
type Two = ...;
# #[derive(Debug, PartialEq, serde::Deserialize)]
struct AnyName(One, String, String, Two, One);
// FIXME: Custom("invalid length 3, expected
//                a tuple of size 5")
#[serde(rename_all = "snake_case")]
enum Choice {
  One,
}
type Two = ...;
type AnyName = (Choice, String, String, Two, Choice);
A sequence with a non-strict order, probably with a mixed content (text and tags).
<one>...</one>
text
<![CDATA[cdata]]>
<two>...</two>
<one>...</one>
A homogeneous sequence of elements with a fixed or dynamic size.
// FIXME: Unsupported("Invalid event for Enum,
//                     expecting `Text` or `Start`")
#[serde(rename_all = "snake_case")]
enum Choice {
  One,
  Two,
  #[serde(other)]
  Other,
}
type AnyName = [Choice; 5];
// FIXME: Custom("unknown variant `text`, expected
//                one of `one`, `two`, `$value`")
#[serde(rename_all = "snake_case")]
enum Choice {
  One,
  Two,
  #[serde(rename = "$value")]
  Other(String),
}
type AnyName = Vec<Choice>;
A sequence with a strict order, probably with a mixed content, (text and tags) inside of the other element.
<any-tag>
  <one>...</one>
  text
  <![CDATA[cdata]]>
  <two>...</two>
  <one>...</one>
</any-tag>

A structure where all child elements mapped to the one field which have
a heterogeneous sequential type: tuple or named tuple. Each element of the
tuple should be able to be deserialized from the nested element content
(...), except the enum types which would be deserialized from the full
element (<one>...</one>):

// FIXME: Custom("missing field `$flatten`")
type One = ...;
type Two = ...;
struct AnyName {
  // Does not (yet?) supported by the serde
  // https://github.com/serde-rs/serde/issues/1905
  // #[serde(flatten)]
  /// Field name is ignored if it is renamed to
  /// `$flatten`
  #[serde(rename = "$flatten")]
  any_name: (One, String, String, Two, One),
}
// FIXME: Custom("missing field `$flatten`")
type One = ...;
type Two = ...;
struct NamedTuple(One, String, String, Two, One);
struct AnyName {
  // Does not (yet?) supported by the serde
  // https://github.com/serde-rs/serde/issues/1905
  // #[serde(flatten)]
  /// Field name is ignored if it is renamed to
  /// `$flatten`
  #[serde(rename = "$flatten")]
  any_name: NamedTuple,
}
A sequence with a non-strict order, probably with a mixed content (text and tags) inside of the other element.
<any-tag>
  <one>...</one>
  text
  <![CDATA[cdata]]>
  <two>...</two>
  <one>...</one>
</any-tag>

A structure where all child elements mapped to the one field which have
a homogeneous sequential type: array-like container. A container type T
should be able to be deserialized from the nested element content (...),
except if it is an enum type which would be deserialized from the full
element (<one>...</one>):

// FIXME: Custom("missing field `$flatten`")
#[serde(rename_all = "snake_case")]
enum Choice {
  One,
  Two,
  #[serde(rename = "$value")]
  Other(String),
}
struct AnyName {
  // Does not (yet?) supported by the serde
  // https://github.com/serde-rs/serde/issues/1905
  // #[serde(flatten)]
  /// Field name is ignored if it is renamed to
  /// `$flatten`
  #[serde(rename = "$flatten")]
  any_name: [Choice; 5],
}
// FIXME: Custom("missing field `$flatten`")
#[serde(rename_all = "snake_case")]
enum Choice {
  One,
  Two,
  #[serde(rename = "$value")]
  Other(String),
}
struct AnyName {
  // Does not (yet?) supported by the serde
  // https://github.com/serde-rs/serde/issues/1905
  // #[serde(flatten)]
  /// Field name is ignored if it is renamed to
  /// `$flatten`
  #[serde(rename = "$flatten")]
  any_name: Vec<Choice>,
}

@Mingun Mingun added serde Issues related to mapping from Rust types to XML documentation Issues about improvements or bugs in documentation labels May 21, 2022
@RodogInfinite
Copy link

I'm not sure if my approach is wrong or if the following scenario should be added to the considerations. The comments within the code block outline what works vs what would be preferred in my case. It would be nice if the need for the "InnerNested" struct could be removed entirely since, in my case, it creates the need for an extra step to access the Vec containing the "InnerNestedDetail" structs which diverges from the standard in the XML.

use serde::{Deserialize, Serialize};
use quick_xml::de::{from_str, DeError};


#[derive(Debug, Deserialize, Serialize)]
struct InnerNestedDetail {
    #[serde(rename="modification_date",default)]
    modification_date: String, // String just for example
    #[serde(rename="version",default)]
    version: f32,
    #[serde(rename="description",default)]
    description: String,
}


// Prefer to not need this struct
#[derive(Debug, Deserialize, Serialize)]
struct InnerNested{
    #[serde(rename = "Inner_Nested_Detail")] // not sure how this attribute's resulting functionality can be implemented in the inner nested field of the OuterTag struct 
   details: Vec<InnerNestedDetail>
}


#[derive(Debug, Deserialize, Serialize)]
pub struct OuterTag {
    identifier: String,
    version: f32,
    #[serde(rename = "Inner_Nested")] 
    inner_nested: InnerNested, // Prefer to have this field as Vec<InnerNestedDetail> and remove the InnerNested struct altogether. The Inner_Nested_Detail cannot be accessed without knowing its outer tag, InnerNested, first, but this adds an extra step for accessing the data when the structs are populated
}

fn parse_xml(xml_string:&str) -> Result<OuterTag,DeError> {
    let test: OuterTag =  from_str(xml_string)?;
    Ok(test)
}

fn main(){
    let xml_string = "<Outer_Tag>
            <identifier>Some Identifier</identifier>
            <version>2.0</version>
            <Inner_Nested>
                <Inner_Nested_Detail>
                    <modification_date>2022-04-20</modification_date>
                    <version>1.0</version>
                    <description>Initial version.</description>
                </Inner_Nested_Detail>
                <Inner_Nested_Detail>
                    <modification_date>2022-05-20</modification_date>
                    <version>2.0</version>
                    <description>Modified version.</description>
                </Inner_Nested_Detail>
            </Inner_Nested>
        </Outer_Tag>";
    let result = parse_xml(xml_string).unwrap();

    println!("{:#?}",result);

    // In order to access the inner nested details vec, an extra step is necessary 
    println!("\nInner Nested Details Vec Current Access:\n{:#?}",result.inner_nested.details);

    // Preferred vec access
    //println!("\nInner Nested Details Vec Preferred Access:\n{:#?}",result.inner_nested);


}
'''

@Mingun
Copy link
Collaborator Author

Mingun commented May 28, 2022

Because inner_nested field represents the Inner_Nested tag of your XML, it is not possible to just skip it in a trivial mapping. But you always can write a simple wrapper for use it with #[serde(with)] that will unpack sequence from the container: see that #365 (comment)

You can also look at the https://lib.rs/crates/serde-query if you looking for a more generic solution. I think I'll mention both alternatives in the final version of the doc.

Mingun added a commit to Mingun/quick-xml that referenced this pull request Aug 26, 2022
dralley pushed a commit that referenced this pull request Aug 26, 2022
New examples will be added in #369
@Mingun Mingun mentioned this pull request Nov 2, 2022
@dralley
Copy link
Collaborator

dralley commented Nov 19, 2022

@Mingun Is this PR still relevant?

@Mingun
Copy link
Collaborator Author

Mingun commented Nov 19, 2022

Technically I'll open a new PR because I cannot change this one due to it is from the other repository and GitHub don't allow me to change it.

@Mingun Mingun closed this Nov 19, 2022
Mingun added a commit to Mingun/quick-xml that referenced this pull request Dec 21, 2022
Mingun added a commit to Mingun/quick-xml that referenced this pull request Dec 24, 2022
Mingun added a commit to Mingun/quick-xml that referenced this pull request Dec 25, 2022
Mingun added a commit to Mingun/quick-xml that referenced this pull request Dec 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues about improvements or bugs in documentation serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants