Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ungzipping buffer with trailing garbage throws error #7502

Closed
horpto opened this issue Jun 30, 2016 · 5 comments
Closed

ungzipping buffer with trailing garbage throws error #7502

horpto opened this issue Jun 30, 2016 · 5 comments
Labels
zlib Issues and PRs related to the zlib subsystem.

Comments

@horpto
Copy link

horpto commented Jun 30, 2016

One site sends a gzipped html page and I try to ungzip page for further processing but zlib throws error incorrect header check. Error happens due to extra data after gzipped page. Yes, gzipped data is not valid, since have trailing data, but in other hand, browser eat that normally.

Minimal example:

var zlib = require('zlib')

// from test-zlib-from-gzip-with-trailing-garbage.js
bufOk = Buffer.concat([
    zlib.gzipSync("abc"),
    Buffer(10).fill(0)
]);

bufWrong = Buffer.concat([
    zlib.gzipSync("abc"),
    Buffer("\n \n\n\n\n \n")
]);

// wil not raise Error
console.log("bufOk:", zlib.gunzipSync(bufOk).toString());

// will raise Error('incorrect header check')
console.log("bufWrong:", zlib.gunzipSync(bufWrong).toString());

Also live example:

var req = require('request')
req.get('http://irsural.ru', {gzip: true}, (err, res, body) => {
    if (err != null) {
        console.log('Error:', err);
        return 
    }
    console.log(body);
});

In other languages (for example, python - requests) I can set windowBits to 16 + zlib.Z_MAX_WINDOWBITS but node strictly restricted windowBits from 0 to 15 even in addon.

  • Version: v6.2.2
  • Platform: linux, x86_64
  • Subsystem: zlib
@mscdex mscdex added the zlib Issues and PRs related to the zlib subsystem. label Jun 30, 2016
@bnoordhuis
Copy link
Member

See PR #5883, node.js used to silently ignore trailing garbage but it no longer does. Zero padding is still allowed but everything else will raise an error.

cc @addaleax

@addaleax
Copy link
Member

In other languages (for example, python - requests) I can set windowBits to 16 + zlib.Z_MAX_WINDOWBITS but node strictly restricted windowBits from 0 to 15 even in addon.

Python’s gzip module seems to behave exactly like nodes does, btw.

Yes, gzipped data is not valid, since have trailing data, but in other hand, browser eat that normally.

There is, unfortunately, no common behaviour for different kinds of trailing data in browsers; often, trailing garbage will be ignored, but e.g. if it looks like the start of another gzip member, Chrome will just bail out completely (I don’t know why it does that, though).

@jasnell
Copy link
Member

jasnell commented Aug 9, 2016

Is there anything to do here @addaleax and @bnoordhuis ?

@horpto horpto closed this as completed Aug 9, 2016
@horpto
Copy link
Author

horpto commented Aug 9, 2016

It's the bad feature, but not the bug.

@mnmkng
Copy link

mnmkng commented Jun 6, 2020

Sorry for resurrecting this after 4 years @addaleax, but I wasn't able to find any solution elsewhere. Is there a way to ignore the trailing garbage with the currently available options? I tried various combinations of zlib.contstants, but with no luck.

Example website:

https://www.ebay.com/sch/sis.html?_nkw=Beechcraft

Browser works fine, gunzip works fine.

curl https://www.ebay.com/sch/sis.html\?_nkw\=Beechcraft -H "accept-encoding: gzip" | gunzip

But I can't convince Node.js to decompress this payload, aside from buffer.slice the end of it, but since the HTML size is dynamic, it doesn't really help, because I could never know in advance how much to slice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

No branches or pull requests

6 participants