-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: improve buffer.transcode performance #54153
src: improve buffer.transcode performance #54153
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #54153 +/- ##
=======================================
Coverage 87.32% 87.32%
=======================================
Files 648 648
Lines 182321 182297 -24
Branches 34973 34969 -4
=======================================
- Hits 159210 159195 -15
- Misses 16376 16384 +8
+ Partials 6735 6718 -17
|
This comment was marked as outdated.
This comment was marked as outdated.
956e92c
to
1c4812e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
5f9f633
to
514b0b2
Compare
This comment was marked as outdated.
This comment was marked as outdated.
514b0b2
to
63092af
Compare
This comment was marked as outdated.
This comment was marked as outdated.
63a202d
to
7b2b153
Compare
This comment was marked as outdated.
This comment was marked as outdated.
7b2b153
to
3071797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Still fails on big endian platforms |
@anonrig Try diff --git a/src/node_i18n.cc b/src/node_i18n.cc
index fab3d7c1c8..a3dee5ee69 100644
--- a/src/node_i18n.cc
+++ b/src/node_i18n.cc
@@ -178,7 +178,7 @@ MaybeLocal<Object> TranscodeLatin1ToUcs2(Environment* env,
UErrorCode* status) {
MaybeStackBuffer<UChar> destbuf(source_length);
auto actual_length =
- simdutf::convert_latin1_to_utf16(source, source_length, destbuf.out());
+ simdutf::convert_latin1_to_utf16le(source, source_length, destbuf.out());
if (actual_length == 0) {
*status = U_INVALID_CHAR_FOUND;
return {};
@@ -222,7 +222,7 @@ MaybeLocal<Object> TranscodeUcs2FromUtf8(Environment* env,
simdutf::utf16_length_from_utf8(source, source_length);
MaybeStackBuffer<UChar> destbuf(expected_utf16_length);
auto actual_length =
- simdutf::convert_utf8_to_utf16(source, source_length, destbuf.out());
+ simdutf::convert_utf8_to_utf16le(source, source_length, destbuf.out());
if (actual_length == 0) {
*status = U_INVALID_CHAR_FOUND;
@@ -239,12 +239,12 @@ MaybeLocal<Object> TranscodeUtf8FromUcs2(Environment* env,
const size_t source_length,
UErrorCode* status) {
const size_t length_in_chars = source_length / sizeof(UChar);
- size_t expected_utf8_length = simdutf::utf8_length_from_utf16(
+ size_t expected_utf8_length = simdutf::utf8_length_from_utf16le(
reinterpret_cast<const char16_t*>(source), length_in_chars);
MaybeStackBuffer<char> destbuf(expected_utf8_length);
auto actual_length =
- simdutf::convert_utf16_to_utf8(reinterpret_cast<const char16_t*>(source),
+ simdutf::convert_utf16le_to_utf8(reinterpret_cast<const char16_t*>(source),
length_in_chars,
destbuf.out());
This is based on V8 internally storing utf16 as little endian (even on big endian platforms). e.g. Lines 149 to 152 in 0260bbe
Line 281 in 3660ad5
|
c198be0
to
b1f95c5
Compare
@richardlau Good catch. The generic utf16 is system-dependent. |
b1f95c5
to
d712db2
Compare
The added benchmark fails with the withoutintl build (
|
d712db2
to
dd548de
Compare
dd548de
to
aab9d64
Compare
Thank you @richardlau for all the help. |
Landed in 7c76fa0 |
PR-URL: #54153 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Minwoo Jung <nodecorelab@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
PR-URL: #54153 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Minwoo Jung <nodecorelab@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Let's improve
buffer.transcode
performance for UTF8 to UTF16le and UTF16le to UTF8 encodings. I'm working on similar implementation on workerd at cloudflare/workerd#2462.Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1591/