Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of null bytes #1429

Closed
Zapotek opened this issue Feb 17, 2016 · 8 comments
Closed

Handling of null bytes #1429

Zapotek opened this issue Feb 17, 2016 · 8 comments

Comments

@Zapotek
Copy link

Zapotek commented Feb 17, 2016

Hello,

Is there a reason why null bytes aren't encoded as � instead of raising an exception?

For example, the following:

Nokogiri::XML( "<blah>stuff</blah>" ).css( 'blah' ).first.content = "\0"

results in:

/home/zapotek/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.rc2/lib/nokogiri/xml/node.rb:416:in `encode_special_chars': string contains null byte (ArgumentError)
        from /home/zapotek/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.rc2/lib/nokogiri/xml/node.rb:416:in `content='
        from tmp/gh/672/nokogiri.rb:3:in `<main>'

Also, they seem to get ignored during parsing as well:

puts Nokogiri::XML( "<blah>test &#0;</blah>" ).to_xml
# => "<?xml version=\"1.0\"?>\n<blah>test </blah>\n"

Cheers

@flavorjones
Copy link
Member

@Zapotek, thanks for asking this question.

However, I'm not sure how to reproduce what you're seeing. The one line you've provided runs fine for me, without an exception.

When running on MRI, and therefore using libxml2, C strings are used under the hood, which may explain some aspect of null-byte "mishandling." Specifically, a C string is null-terminated, and so a null byte in the middle of a string causes premature termination of parsing. But that doesn't appear to be what you're trying to demonstrate above, so I'd love to understand better what you're seeing and how to reproduce it.

@Zapotek
Copy link
Author

Zapotek commented Feb 17, 2016

Are you using 1.6.8.rc2? Because null bytes seem to just get ignored without warning on 1.6.7.

@flavorjones
Copy link
Member

Ah, it wasn't clear that you were on 1.6.8.rc2 (without looking closely at your error messages).

This is likely related to the fact that Ruby's internal strings were null-terminated before 2.2.0, but now no longer are guaranteed to be null-terminated. As a result, we explicitly use null-terminated strings internally where we didn't always previously, in order to match libxml2's semantics.

Can you help me understand the real-world problem you're encountering? It may still be possible for us to change this behavior, though at this point it's not obvious to me how, or why, we should do that. Can you explain a bit about the document you're parsing or the objective you're trying to accomplish?

@Zapotek
Copy link
Author

Zapotek commented Feb 17, 2016

In my case, I'm generating an XML report of a web application's security scan from Arachni.

Some vulnerabilities can only be identified by using null-terminated strings (ironically enough, to trigger the same behavior causing this bug) and the relevant payloads (which can contain null bytes) need to be reported in order for the user to reproduce and verify the reported vulnerabilities.

EDIT: If there's no support for null bytes in Nokogiri then I either can't generate the report or will end up providing inaccurate data to my users, thus preventing them from properly verifing and fixing the reported issues.

@flavorjones
Copy link
Member

@larskanis - any thoughts on this? You're more familiar with the StringValueCStr changes. This particular issue stems from https://github.com/sparklemotion/nokogiri/blob/master/ext/nokogiri/xml_node.c#L351

@Zapotek I can't say I understand why null bytes would be meaningful, but I'll suspend disbelief and we'll try to figure this out.

@larskanis
Copy link
Member

StringValueCStr() not only ensures, that a null character is at the end of the string, but also that no other null character is within the string. Both is enforced in nokogiri-1.6.8-rc2 for most of the strings sent to libxml/libxslt.

What options do we have?

  1. Revert to 1.6.7-behavior. This means that the null byte and all characters after the null byte are silently ignored.
  2. Keep 1.6.8-behavior. This means that the user will get notified about null bytes that would damage the inserted content.
  3. Follow the behavior of nokogiri@jruby. In JRuby all null bytes are treated as valid UTF-8 characters. However this doesn't work at libxml, because libxml doesn't export any characters below 0x20 .
  4. Write our own escape function as replacement of libxml's xmlEncodeSpecialChars() . This could properly convert null bytes to "&#0;", but still libxml treats escaped null bytes as nothing when inserting into a node.

Since libxml doesn't allow null bytes at all - neither as plain text nor as escaped characters - I think it's best to stay at option 2.

@Zapotek
Copy link
Author

Zapotek commented Feb 18, 2016

Well, if libxml strips even escaped nulls then yeah, raising an exception is the way to go I guess.
That's too bad though.

Thanks for the information folks.

@Zapotek Zapotek closed this as completed Feb 18, 2016
@larskanis
Copy link
Member

@Zapotek Please note that null character handling is outside of the XML specification. So if your specific XML format requires these characters, it is not an XML format. See: https://www.w3.org/TR/REC-xml/#charsets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants