Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextDecoder('utf-8') doesn’t match spec #16894

Closed
srl295 opened this issue Nov 9, 2017 · 14 comments
Closed

TextDecoder('utf-8') doesn’t match spec #16894

srl295 opened this issue Nov 9, 2017 · 14 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation.

Comments

@srl295
Copy link
Member

srl295 commented Nov 9, 2017

  • Version: tested v8.5.0 / master
  • Subsystem: core

from discussion in #16876

in Node.js,

new (require('util').TextDecoder)('utf-8')
 .decode(Buffer.from([0xF0, 0x80, 0x80])).length === 1 // U+FFFD

But in Safari/FF/Chrome,

new TextDecoder('utf-8')
 .decode(new Uint8Array([0xF0, 0x80, 0x80])).length === 3 // U+FFFD U+FFFD U+FFFD

think Node.js is wrong here per https://encoding.spec.whatwg.org/#utf-8-decoder

//cc @mathiasbynens

@srl295 srl295 added the i18n-api Issues and PRs related to the i18n implementation. label Nov 9, 2017
@srl295 srl295 mentioned this issue Nov 9, 2017
2 tasks
@srl295
Copy link
Member Author

srl295 commented Nov 9, 2017

@jungshik
Copy link

jungshik commented Nov 9, 2017

The code snippet above in web browsers(implementing TextDecoder API) should read as following:

new TextDecoder('utf-8')
 .decode(new Uint8Array([0xF0, 0x80, 0x80])).length

@srl295
Copy link
Member Author

srl295 commented Nov 9, 2017

@jungshik oops… exactly right. I fixed it.

@srl295
Copy link
Member Author

srl295 commented Nov 9, 2017

^ already fixed in v8 (see link above)

Yea I recently fixed this in V8. The fix is in V8 6.3.
The relevant CL is https://chromium-review.googlesource.com/671020
See also https://bugs.chromium.org/p/chromium/issues/detail?id=765608

@srl295 srl295 added the v8 engine Issues and PRs related to the V8 dependency. label Nov 9, 2017
@tniessen
Copy link
Member

The relevant commit is v8/v8@6389b7e. Should we cherry-pick this for node 8?

@srl295
Copy link
Member Author

srl295 commented Nov 10, 2017

@tniessen seems like a good idea—means the other UTF-8 handling (v8 internal, and via ICU) will be in line with TextDecoder

@TimothyGu
Copy link
Member

TimothyGu commented Nov 10, 2017

This issue really has nothing to do with V8, since TextDecoder is backed by ICU. On master, this has been fixed by #16876, but not in v9.x or earlier. It seems the original issue (TextDecoder) only affects released versions of Node.js. We'll also have to make the same fix for require('string_decoder').StringDecoder, if we were to go all in.

@TimothyGu TimothyGu added c++ Issues and PRs that require attention from people who are familiar with C++. and removed v8 engine Issues and PRs related to the V8 dependency. labels Nov 10, 2017
@TimothyGu
Copy link
Member

TimothyGu commented Nov 10, 2017

I guess we just need to backport

to v8.x, where the TextDecoder API is actually stable; or we can just upgrade ICU to 60 in v8.x LTS. We'll also need tests, as it seems WPT does not currently contain any tests for this behavior. /cc @jasnell

@srl295
Copy link
Member Author

srl295 commented Nov 11, 2017

No please do not backport those commits. V8 and not icu is used for utf -8. Check the code. @jasnell

@TimothyGu
Copy link
Member

@srl295 I have just backported the commits in my local branch, and this issue was fixed. Can you explain how V8 is used for UTF-8? If you mean

return encodeUtf8String(`${input}`);
, that is for TextEncoder, not TextDecoder.

@srl295
Copy link
Member Author

srl295 commented Nov 11, 2017 via email

@TimothyGu
Copy link
Member

@srl295

The issue I saw is there even with no icu

no-icu uses string_decoder.StringDecoder, which as I noted in #16894 (comment) handles this wrongly as well.

@srl295
Copy link
Member Author

srl295 commented Nov 16, 2017 via email

@jasnell
Copy link
Member

jasnell commented Aug 11, 2018

appears to be fixed in 10.x and master

@jasnell jasnell closed this as completed Aug 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation.
Projects
None yet
Development

No branches or pull requests

5 participants