Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's BytesStart::unescaped for? #341

Closed
scottlamb opened this issue Nov 13, 2021 · 3 comments
Closed

what's BytesStart::unescaped for? #341

scottlamb opened this issue Nov 13, 2021 · 3 comments

Comments

@scottlamb
Copy link

I'm trying to switch to quick-xml and am struggling to understand the API for just getting strings (for tag names, attribute names, attribute values, and text/cdata nodes).

I saw that the caller must unescaped text/cdata/attribute values, and this comment that the caller must decode the character encoding.

Next question: what is BytesStart::unescaped for? It says it handles escapes/entities like < but that doesn't make sense to me. Is this the right grammar for the XML tag name from the spec?

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
Name ::= NameStartChar (NameChar)*

It looks like a & (aka #x26) isn't allowed. So why unescape? or what am I missing?

@Mingun
Copy link
Collaborator

Mingun commented May 25, 2022

Actually, that method gives you all internal content of a tag (i.e. name + attributes). As such, it could have been intended in order to expand entities in all attributes in one pass. Probably we should remove it and provide other ways to get normalized attribute value (see also #371)

@Mingun
Copy link
Collaborator

Mingun commented May 25, 2022

Duplicate of #118

@Mingun Mingun marked this as a duplicate of #118 May 25, 2022
@Mingun Mingun closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2022
@kornelski
Copy link

I suggest to deprecate or remove this method. I don't see any valid use for it. Entities are not allowed in tag names, and unescaping of attributes as a whole allows injection:

<foo real="1&quot; fake=&quot;2"/>

unescapes to:

<foo real="1" fake="2"/>

kornelski added a commit to kornelski/quick-xml that referenced this issue Jun 19, 2022
kornelski added a commit to kornelski/quick-xml that referenced this issue Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants