-
-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of null bytes #1429
Comments
@Zapotek, thanks for asking this question. However, I'm not sure how to reproduce what you're seeing. The one line you've provided runs fine for me, without an exception. When running on MRI, and therefore using libxml2, C strings are used under the hood, which may explain some aspect of null-byte "mishandling." Specifically, a C string is null-terminated, and so a null byte in the middle of a string causes premature termination of parsing. But that doesn't appear to be what you're trying to demonstrate above, so I'd love to understand better what you're seeing and how to reproduce it. |
Are you using 1.6.8.rc2? Because null bytes seem to just get ignored without warning on 1.6.7. |
Ah, it wasn't clear that you were on 1.6.8.rc2 (without looking closely at your error messages). This is likely related to the fact that Ruby's internal strings were null-terminated before 2.2.0, but now no longer are guaranteed to be null-terminated. As a result, we explicitly use null-terminated strings internally where we didn't always previously, in order to match libxml2's semantics. Can you help me understand the real-world problem you're encountering? It may still be possible for us to change this behavior, though at this point it's not obvious to me how, or why, we should do that. Can you explain a bit about the document you're parsing or the objective you're trying to accomplish? |
In my case, I'm generating an XML report of a web application's security scan from Arachni. Some vulnerabilities can only be identified by using null-terminated strings (ironically enough, to trigger the same behavior causing this bug) and the relevant payloads (which can contain null bytes) need to be reported in order for the user to reproduce and verify the reported vulnerabilities. EDIT: If there's no support for null bytes in Nokogiri then I either can't generate the report or will end up providing inaccurate data to my users, thus preventing them from properly verifing and fixing the reported issues. |
@larskanis - any thoughts on this? You're more familiar with the @Zapotek I can't say I understand why null bytes would be meaningful, but I'll suspend disbelief and we'll try to figure this out. |
What options do we have?
Since libxml doesn't allow null bytes at all - neither as plain text nor as escaped characters - I think it's best to stay at option 2. |
Well, if libxml strips even escaped nulls then yeah, raising an exception is the way to go I guess. Thanks for the information folks. |
@Zapotek Please note that null character handling is outside of the XML specification. So if your specific XML format requires these characters, it is not an XML format. See: https://www.w3.org/TR/REC-xml/#charsets |
Hello,
Is there a reason why null bytes aren't encoded as
�
instead of raising an exception?For example, the following:
results in:
Also, they seem to get ignored during parsing as well:
Cheers
The text was updated successfully, but these errors were encountered: