doc: mark String decoder as legacy #39301

jimmywarting · 2021-07-07T23:13:03Z

📗 API Reference Docs Problem

https://nodejs.org/docs/latest-v16.x/api/string_decoder.html

Description

Like with util inherit and querystring can you also mark string_decoder as legacy and say something in terms of:

The String decoder is considered Legacy. While it is still maintained, new code should use the TextDecoder API instead.

Would be best for cross platform coding... Maybe even possible deprecate it?

I would like to work on this issue and submit a pull request.

The text was updated successfully, but these errors were encountered:

aduh95 · 2021-07-08T08:46:21Z

I think TextDecoder encoding support is limited to utf-8 while String decoder has also support for utf-16, but maybe that's fine.

targos · 2021-07-08T08:51:29Z

I think TextDecoder encoding support is limited to utf-8 while String decoder has also support for utf-16, but maybe that's fine.

TextDecoder supports a lot of different encodings: https://nodejs.org/dist/latest-v16.x/docs/api/util.html#util_encodings_supported_by_default_with_full_icu_data

aduh95 · 2021-07-08T09:01:55Z

My bad, I mixed it with TextEncoder! Please disregard my previous comment.

mojavelinux · 2021-07-10T09:32:11Z

In a quick test, TextDecoder is an order of magnitude slower to decode a Uint8Array to a String than StringDecoder using Node 16. And I can't find anything that works faster than StringDecoder. Please don't deprecate StringDecoder until the performance is at least equivalent.

jimmywarting · 2021-07-10T10:51:29Z

@mojavelinux would you mind sharing a performence test case? ... with test results?

We maybe won't go as far as deprecating it and eventually removing it.
Maybe we will just mark it as legacy and say it's best avoided if you write cross compatible code in the docs.

Have you also considered that TextEncoder takes longer cuz it can solve some problems StringDecoder dose not handle? It maybe solves something better that StringDecoder dose not do, like BOM for instance

Also curios how the browserified version compares to native TextDecoder, this is the main reason why i want to see less use StringDecoder... To stop ppl from exporting node modules into browsers and increasing the bundle size too much and writing better cross compatible code

string_decoder depends on Buffer so depending on https://www.npmjs.com/package/string_decoder in the browser adds a hole bunch of bloatware to your bundle, and the browser Buffer is not as fast as node's Buffer

jimmywarting · 2021-07-10T11:05:32Z

const { StringDecoder } = require('string_decoder');

const data = new Uint8Array([239, 187, 191, 49]) // BOM + 1(49)

// fails 
JSON.parse(new StringDecoder('utf8').write(data))
JSON.parse(Buffer.from(data.buffer).toString())

// works
JSON.parse(new TextDecoder().decode(data))
new Response(data).json()

I belive fetch decodes data similar to how TextDecoder dose it, more correctly according to a spec...
related: node-fetch/node-fetch#541

mojavelinux · 2021-07-10T11:35:17Z

A pretty basic test can demonstrate the order of magnitude change:

const iterations = 10000000

const stringDecoder = new (require('string_decoder').StringDecoder)()
const textDecoder = new (require('util').TextDecoder)()

const SAMPLE = new Uint8Array([112,97,103,101,45,111,110,101,46,97,100,111,99])

function variationA (arr) {
  return stringDecoder.write(arr)
}

function variationB (arr) {
  return textDecoder.decode(arr)
}

let result
const variation = process.argv[2] === 'A' ? variationA : variationB
console.time(variation.name)
for (let i = 0; i < iterations; i++) {
  result = variation(SAMPLE)
}
console.timeEnd(variation.name)

On my machine, the results are as follows:

node perf.js A
//=> variationA: 940.728ms
node perf.js B
//=> variationB: 8.244s

In Chrome, I get:

variationB: 5491ms

It maybe solves something better that StringDecoder dose not do, like BOM for instance

Perhaps I don't need that behavior ;) I also don't think BOM processing warrants an order of magnitude change difference in execution time.

jimmywarting · 2021-07-10T11:38:35Z

In Chrome, I get:
variationB: 5491ms

Missing variationA in browser

mojavelinux · 2021-07-10T11:39:10Z

I don't know what the API in the browser is for StringDecoder.

jimmywarting · 2021-07-10T11:40:50Z

I don't know what the API in the browser is for StringDecoder.

something like:

npx browserify file.js > out.js

// file.js
x = require('string_decoder')

mojavelinux · 2021-07-10T11:41:37Z

I'm fine with removing the public use of StringDecoder. Perhaps under the covers in Node.js, TextDecoder can use the StringDecoder code when the encoding is utf-8 and ignoreBOM is true. I just don't want to have to bear such a tremendous drop in performance as it will affect the performance of my application (which calls this method hundreds of thousands of times).

mojavelinux · 2021-07-10T11:42:25Z

npx browserify file.js > out.js

All that's doing is testing the implementation in Node.js in a browser. I wouldn't expect there to be any difference since it's the same code (and the same JavaScript engine). I thought you were comparing to a native implementation in the browser.

mojavelinux · 2021-07-10T11:45:16Z

As I suspected, the performance is just as good in the browser for StringDecoder. In Chrome, I get:

variationA: 1389ms

What we see is that the performance of the native TextDecoder in Chrome is better than the TextDecoder in Node.js. But there is still a measurable difference when compared to StringDecoder.

targos · 2021-07-10T11:49:22Z

A quick run with the profiler shows most of the time is spent in one function:

Actually, another run shows something different...

jimmywarting · 2021-07-10T11:57:00Z

I modified the test a bit and run the test on some webpages with a realistic larger sample set

const SAMPLE = new TextEncoder().encode(document.body.innerText)

it shows variantA is much slower... (in chrome after being browserified)

const iterations = 1000

const stringDecoder = new (require('string_decoder').StringDecoder)()
const textDecoder = (new TextDecoder())

const SAMPLE = new TextEncoder().encode(document.body.innerText)

function variationA (arr) {
  return stringDecoder.write(arr)
}

function variationB (arr) {
  return textDecoder.decode(arr)
}

let result


console.time(variationA.name)
for (let i = 0; i < iterations; i++) {
  result = variationA(SAMPLE)
}
console.timeEnd(variationA.name)


console.time(variationB.name)
for (let i = 0; i < iterations; i++) {
  result = variationB(SAMPLE)
}
console.timeEnd(variationB.name)

on nodejs.org innerText

variationA: 31.10400390625 ms
variationB: 1.6748046875 ms

jimmywarting · 2021-07-10T12:00:58Z

here is my browser bundle: https://pastebin.com/DDMKi3yH (for those who do not want to run browserify and wants to test it out for themself) the test are at the bottom, the fact that it requires ~2400 lines of code in the browser and run much slower is exactly the reason why i want to discourage use of it

EDIT:

here is a cdn solution:

const {StringDecoder} = await import('https://jspm.dev/string_decoder')
const iterations = 1000
const stringDecoder = new StringDecoder()
const textDecoder = new TextDecoder()

const SAMPLE = new TextEncoder().encode(document.body.innerText)

function variationA (arr) {
  return stringDecoder.write(arr)
}

function variationB (arr) {
  return textDecoder.decode(arr)
}

let result


console.time(variationA.name)
for (let i = 0; i < iterations; i++) {
  result = variationA(SAMPLE)
}
console.timeEnd(variationA.name)


console.time(variationB.name)
for (let i = 0; i < iterations; i++) {
  result = variationB(SAMPLE)
}
console.timeEnd(variationB.name)

mojavelinux · 2021-07-10T12:19:47Z

I can confirm that TextDecoder does seem to perform better for larger sample sizes. Perhaps there is a problem with the implementation that it performs so poorly for a small sample size. I don't know enough at this point to begin to be able to explain what's going on. These are certainly useful observations.

targos · 2021-07-10T12:33:14Z

I did other runs (of #39301 (comment)) with the code from master instead of a release and here's what I get:

The 60% of "self time" in TextDecoder.decode actually come from the call to the C++ function here:

node/src/node_i18n.cc

Line 432 in e2148d7

void ConverterObject::Decode(const FunctionCallbackInfo<Value>& args) {

jasnell · 2021-07-10T23:18:49Z

@mojavelinux:

Please don't deprecate StringDecode...

Marking an API as legacy is not the same as deprecating it. APIs that are marked legacy are unlikely to ever change or be deprecated.

mojavelinux · 2021-07-11T01:54:54Z

Thanks for pointing that out. Sorry for creating any confusion by mixing up the terminology.

zcbenz · 2024-07-18T01:51:09Z

The node:string_decoder has a feature that it holds the bytes until a full unicode character can be printed, which I don't think TextDecoder provides:

When a Buffer instance is written to the StringDecoder instance, an internal buffer is used to ensure that the decoded string does not contain any incomplete multibyte characters. These are held in the buffer until the next call to stringDecoder.write() or until stringDecoder.end() is called.

This feature is very convenient when printing outputs byte by byte, for example when implementing the streaming interface of LLM inference.

targos · 2024-07-18T05:45:17Z

You can do that with TextDecoder as well:

> var d = new TextDecoder
undefined
> d.decode(Uint8Array.of(0xe2), { stream: true })
''
> d.decode(Uint8Array.of(0x82), { stream: true })
''
> d.decode(Uint8Array.of(0xAC), { stream: true })
'€'

jimmywarting added the doc Issues and PRs related to the documentations. label Jul 7, 2021

targos added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. string_decoder Issues and PRs related to the string_decoder subsystem. and removed encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. labels Aug 9, 2021

jimmywarting mentioned this issue Aug 25, 2021

TextDecoder is slow. #39879

Open

jimmywarting mentioned this issue Mar 8, 2022

Start moving to Uint8Array in new APIs? #41588

Open

jimmywarting mentioned this issue Aug 11, 2022

node::Buffer::New is 4x slower than V8 APIs #44111

Closed

jimmywarting mentioned this issue Oct 31, 2022

Migrate to Uint8Array mtth/avsc#410

Open

jimmywarting mentioned this issue Nov 10, 2022

buffer: add asString static method #45408

Closed

JohnGu9 mentioned this issue Nov 28, 2023

replace StringDecoder with TextDecoder ashtuchkin/iconv-lite#316

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: mark String decoder as legacy #39301

doc: mark String decoder as legacy #39301

jimmywarting commented Jul 7, 2021 •

edited

Loading

aduh95 commented Jul 8, 2021

targos commented Jul 8, 2021

aduh95 commented Jul 8, 2021

mojavelinux commented Jul 10, 2021

jimmywarting commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021

jimmywarting commented Jul 10, 2021

mojavelinux commented Jul 10, 2021

mojavelinux commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021 •

edited

Loading

targos commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021

targos commented Jul 10, 2021

jasnell commented Jul 10, 2021

mojavelinux commented Jul 11, 2021

zcbenz commented Jul 18, 2024

targos commented Jul 18, 2024

doc: mark String decoder as legacy #39301

doc: mark String decoder as legacy #39301

Comments

jimmywarting commented Jul 7, 2021 • edited Loading

📗 API Reference Docs Problem

Description

aduh95 commented Jul 8, 2021

targos commented Jul 8, 2021

aduh95 commented Jul 8, 2021

mojavelinux commented Jul 10, 2021

jimmywarting commented Jul 10, 2021 • edited Loading

jimmywarting commented Jul 10, 2021 • edited Loading

mojavelinux commented Jul 10, 2021 • edited Loading

jimmywarting commented Jul 10, 2021 • edited Loading

mojavelinux commented Jul 10, 2021

jimmywarting commented Jul 10, 2021

mojavelinux commented Jul 10, 2021

mojavelinux commented Jul 10, 2021 • edited Loading

mojavelinux commented Jul 10, 2021 • edited Loading

targos commented Jul 10, 2021 • edited Loading

jimmywarting commented Jul 10, 2021 • edited Loading

jimmywarting commented Jul 10, 2021 • edited Loading

mojavelinux commented Jul 10, 2021

targos commented Jul 10, 2021

jasnell commented Jul 10, 2021

mojavelinux commented Jul 11, 2021

zcbenz commented Jul 18, 2024

targos commented Jul 18, 2024

jimmywarting commented Jul 7, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021 •

edited

Loading

mojavelinux commented Jul 10, 2021 •

edited

Loading

targos commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading

jimmywarting commented Jul 10, 2021 •

edited

Loading