Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: remove BINARY encoding #5504

Closed
wants to merge 1 commit into from

Conversation

indutny
Copy link
Member

@indutny indutny commented Mar 1, 2016

Pull Request check-list

Please make sure to review and check all of these items:

  • Does make -j8 test (UNIX) or vcbuild test nosign (Windows) pass with
    this change (including linting)?
  • Is the commit message formatted according to CONTRIBUTING.md?
  • If this change fixes a bug (or a performance problem), is a regression
    test (or a benchmark) included?
  • Is a documentation update included (if this change modifies
    existing APIs, or introduces new ones)?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Affected core subsystem(s)

Please provide affected core subsystem(s) (like buffer, cluster, crypto, etc)

Description of change

It was fun,
now it is gone.


I decided to follow @chrisdickinson footsteps and submit provocative PR 😉 Hopefully this will help to spin of a useful discussion!

It was fun,
now it is gone.
@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

@indutny indutny mentioned this pull request Mar 1, 2016
@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

cc @nodejs/ctc @nodejs/collaborators @nodejs/crypto

@indutny indutny added semver-major PRs that contain breaking changes and should be released in the next major version. tls Issues and PRs related to the tls subsystem. crypto Issues and PRs related to the crypto subsystem. buffer Issues and PRs related to the buffer subsystem. http Issues or PRs related to the http subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. doc Issues and PRs related to the documentations. net Issues and PRs related to the net subsystem. labels Mar 1, 2016
@Fishrock123
Copy link
Contributor

Could you explain what benefit this is to us? I don't really understand.

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

Of course, binary encoding causes significant problems in crypto, and less often in other places. In general, it looks like people don't really understand how it works, and even if they do - other people that are using their code don't understand the limitations of binary.

As it is right now, it does not provide much use IMO, and there are lots of code to handle it. Therefore I propose removing it.

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

-100. The binary encoding is still very useful in many situations, for example: when you need/want to take advantage of existing string or other functions (e.g. regexps) that cannot handle Buffers.

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

@mscdex example?

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

What if I will change defaults for crypto module instead? How should I approach it?

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

@indutny Regexps was one such example. Anything from String.prototype.

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

@mscdex why utf8 won't work in your case?

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

@indutny I think the reason that the default encoding is binary for crypto is that the typical use case is binary data. Yes most people will probably be using Buffers, but the binary encoding keeps with that theme, but for non-Buffer inputs.

@mscdex why utf8 won't work in your case?

I'm not sure what you mean here. Converting binary to utf8 is not safe.

@seishun
Copy link
Contributor

seishun commented Mar 1, 2016

+1 it was meant to be removed long ago.

@mscdex why would you use regexp for non-textual data?

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

@mscdex it is not safe to pass utf8 as binary. Also binary encoding may produce some unexpected results. What is unsafe about utf8?

@mscdex I would really appreciate if you could share a link with me, demonstrating a use case in some project. From what I heard, it does not look like binary is a requirement.

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

@seishun Because it's useful if you need something more than a simple .indexOf()? For example, even needing to do a simple case-insensitive .indexOf() is impossible with Buffers.

@seishun
Copy link
Contributor

seishun commented Mar 1, 2016

@mscdex as @indutny said, a real-world usage example would be welcome.

For example, even needing to do a simple case-insensitive .indexOf() is impossible with Buffers.

Binary data doesn't have a concept of case, it's just a byte sequence.

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

@indutny My point is that if you change the default encoding to utf8, strings containing binary bytes can get misinterpreted:

> new Buffer('\xFF\x7F', 'utf8')
<Buffer c3 bf 7f>

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

Well, same thing with binary?

> new Buffer('привет', 'binary').toString()
'?@825B'

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

@indutny I don't see that as the same thing. Once you have imported the binary string with the 'binary' encoding there is no need to do .toString(). If you remove the 'binary' encoding you can no longer import the string safely (or generate a 'binary' string for manipulation when Buffer methods aren't enough as I previously described).

@indutny
Copy link
Member Author

indutny commented Mar 1, 2016

@mscdex what is "import the string safely" in your definition?

@micnic
Copy link
Contributor

micnic commented Mar 1, 2016

Is there any performance impact when using utf8 as default encoding?

@mscdex
Copy link
Contributor

mscdex commented Mar 1, 2016

@indutny "safely" meaning the data/content is not modified

scovetta pushed a commit to scovetta/node that referenced this pull request Apr 2, 2016
Debug mode slows execution speed. There is work afoot to enable Debug
mode runs on the continuous integration infrastructure for the project.
Some tests are timing out, such as test-net-nodejsGH-5504.js.

This change doubles the timeout returned from `common.platformTimeout()`
when running in Debug mode. It also removes an unused variable from the
aforementioned test-net-nodejsGH-5504.js.

PR-URL: nodejs#4431
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@vitaliylag
Copy link

vitaliylag commented Apr 23, 2016

Examples when Binary is useful:

var obj = {};
function wasExecuted(buf) {
    return buf.toString('binary') in obj;
    //Here we could use hex but it works slower and requires more memory
}
function addNumberToString(s, num) {
    var isBuf = Buffer.isBuffer(s) && (s = s.toString('binary'), true);

    //Here we can work with s like usual string. We don't need write individual
    //code for Buffer case. Of course we shouldn't add to string big char codes.
    s = s + num;

    return isBuf ? new Buffer(s, 'binary') : s;
}

I guess I can not use utf8 in these examples because not all bytes sequences can be safely converted to utf8, am I right?

@trevnorris
Copy link
Contributor

@vitaliylag The majority of utf8 byte sequences can't be converted.

A useful usage I've used in the past is when I want to hold many open file contents that are relatively large in size (>= 1MB). Done this while parsing data sets. If the file text can be assumed as ascii or latin1 then the string can be referenced out side of the heap. Example script:

const buf = Buffer.alloc(1024 * 1024, 'a');
const arr = [];
for (var i = 0; i < 1e3; i++) {
  arr.push(buf.toString('binary'));
  //arr.push(buf.toString('utf8'));
}
process._rawDebug(require('v8').getHeapStatistics().used_heap_size / 0x100000);

Using 'binary' encoding output is ~3MB. Whereas 'utf8' sits just over 1GB.

bnoordhuis referenced this pull request in bnoordhuis/io.js Jul 4, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: nodejs#6611
PR-URL: nodejs#7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Fishrock123 referenced this pull request Jul 5, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: #6611
PR-URL: #7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins referenced this pull request Jul 11, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: #6611
PR-URL: #7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins referenced this pull request Jul 12, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: #6611
PR-URL: #7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins referenced this pull request Jul 12, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: #6611
PR-URL: #7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins referenced this pull request Jul 14, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: #6611
PR-URL: #7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins referenced this pull request Jul 14, 2016
Avoid transient DNS issues in test sequential/test-net-GH-5504 by using
the IP address instead of the 'localhost' host name.

Fixes: #6611
PR-URL: #7524
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Trott added a commit to Trott/io.js that referenced this pull request Nov 4, 2016
test-net-nodejsGH-5504 failed on smartos14-64 in CI. The internal test timeout
expired. It is currently 2 seconds. That appears to be somewhat
arbitrary. It may be possible to rewrite the test without the timer.  As
an immediate workaround, increase the timeout to 5 seconds.
santigimeno added a commit to santigimeno/node that referenced this pull request Nov 6, 2016
The test is failing on `SmartOS` quite often. Removing the timeout seems
to fix it.

Fixes: nodejs#8930
PR-URL: nodejs#9461
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
evanlucas pushed a commit that referenced this pull request Nov 7, 2016
The test is failing on `SmartOS` quite often. Removing the timeout seems
to fix it.

Fixes: #8930
PR-URL: #9461
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
MylesBorins pushed a commit that referenced this pull request Nov 22, 2016
The test is failing on `SmartOS` quite often. Removing the timeout seems
to fix it.

Fixes: #8930
PR-URL: #9461
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
@MylesBorins MylesBorins mentioned this pull request Nov 22, 2016
Trott added a commit to Trott/io.js that referenced this pull request May 14, 2017
* Improve comments describing test.
* Remove unnecessary `common.noop`
* Remove confusing scoping of `c` identifier
Trott added a commit to Trott/io.js that referenced this pull request May 20, 2017
* Improve comments describing test.
* Remove unnecessary `common.noop`
* Remove confusing scoping of `c` identifier

PR-URL: nodejs#13025
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@jasnell jasnell mentioned this pull request May 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. crypto Issues and PRs related to the crypto subsystem. doc Issues and PRs related to the documentations. http Issues or PRs related to the http subsystem. net Issues and PRs related to the net subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.