Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: re-enable Fast API for Buffer.write #54526

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

ronag
Copy link
Member

@ronag ronag commented Aug 23, 2024

Re-enables fast Fast API for Buffer.write after fixing UTF8 handling.

13:05:25 buffers/buffer-write-string-short.js n=1000000 len=1 encoding='ascii'                            ***    266.52 %       ±7.27%  ±9.75% ±12.84%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=1 encoding='latin1'                           ***    201.16 %       ±4.74%  ±6.33%  ±8.29%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=1 encoding='utf8'                             ***    242.07 %       ±6.29%  ±8.44% ±11.14%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=16 encoding='ascii'                           ***    264.81 %       ±5.88%  ±7.86% ±10.29%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=16 encoding='latin1'                          ***    197.28 %       ±6.96%  ±9.32% ±12.26%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=16 encoding='utf8'                            ***    349.02 %       ±8.80% ±11.84% ±15.68%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=32 encoding='ascii'                           ***    271.75 %       ±8.70% ±11.69% ±15.43%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=32 encoding='latin1'                          ***    214.58 %       ±5.85%  ±7.84% ±10.32%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=32 encoding='utf8'                            ***    351.17 %       ±7.64% ±10.28% ±13.60%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=8 encoding='ascii'                            ***    273.30 %       ±8.65% ±11.63% ±15.38%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=8 encoding='latin1'                           ***    191.62 %       ±5.62%  ±7.52%  ±9.88%
13:05:25 buffers/buffer-write-string-short.js n=1000000 len=8 encoding='utf8'                             ***    224.55 %       ±4.72%  ±6.33%  ±8.33%

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Aug 23, 2024
src/node_buffer.cc Outdated Show resolved Hide resolved
@ronag ronag force-pushed the fast-utf8-2 branch 2 times, most recently from a930ad4 to 0fda693 Compare August 23, 2024 14:45
@ronag ronag marked this pull request as ready for review August 23, 2024 14:46
@ronag ronag force-pushed the fast-utf8-2 branch 2 times, most recently from 23f1b23 to 8be5c43 Compare August 23, 2024 14:50
{
let i = 0;

while (i < 1_000_000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a non-deterministic test is unpleasant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can trigger fast api call using the method @targos implemented in his previous pull requests.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me. I wish we could use a deterministic test. Having to loop a million times to trigger and issue is both slow and error prone.

Copy link

codecov bot commented Aug 23, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 8 lines in your changes missing coverage. Please review.

Project coverage is 87.60%. Comparing base (981c701) to head (0ae2ae1).
Report is 252 commits behind head on main.

Files with missing lines Patch % Lines
src/node_buffer.cc 88.88% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #54526      +/-   ##
==========================================
- Coverage   87.61%   87.60%   -0.01%     
==========================================
  Files         650      650              
  Lines      182835   182889      +54     
  Branches    35382    35389       +7     
==========================================
+ Hits       160185   160227      +42     
- Misses      15925    15927       +2     
- Partials     6725     6735      +10     
Files with missing lines Coverage Δ
src/node_buffer.cc 71.41% <88.88%> (+1.46%) ⬆️

... and 32 files with indirect coverage changes

@ronag ronag added the request-ci Add this label to start a Jenkins CI on a PR. label Aug 23, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Aug 23, 2024
@nodejs-github-bot

This comment was marked as outdated.

@ronag ronag added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Aug 23, 2024
@ronag ronag mentioned this pull request Aug 23, 2024
Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some lint issues but otherwise LGTM

@ronag ronag added the request-ci Add this label to start a Jenkins CI on a PR. label Aug 24, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Aug 24, 2024
@nodejs-github-bot

This comment was marked as outdated.

@blexrob
Copy link

blexrob commented Aug 24, 2024

With the current commits, the following test will fail:

let i = 0;
const testStr = "\xc2\x80"; // when stored in 1-byte chars, this is a valid UTF-8 encoding
const expected = Buffer.from(testStr).toString("hex");
for(; i < 1_000_000; i++) {
  const buf = Buffer.from(testStr);
  const ashex = buf.toString("hex");
  if (ashex !== expected) {
    console.log(`Decoding changed in iteration ${i} when changing to FastWriteStringUTF8, got ${ashex}, expected ${expected}`);
    break;
  }
}

if(i<1_000_000) {
  console.error("FAILED after %d iterations",i);
} else
  console.log("PASSED after %d iterations",i);

The source string is stored as UTF16 (compressed to 1 byte), but still needs to be converted to UTF-8. The fast path can only be implemented as memcpy when only characters from the range 0-127 are present.

@ronag
Copy link
Member Author

ronag commented Aug 24, 2024

The source string is stored as UTF16 (compressed to 1 byte), but still needs to be converted to UTF-8. The fast path can only be implemented as memcpy when only characters from the range 0-127 are present.

I thought FastOneByteString means that it's only one byte chars?

@ronag
Copy link
Member Author

ronag commented Aug 24, 2024

< when only characters from the range 0-127 are present

@lemira does simdutf have this? validate_ascii?

@nodejs-github-bot
Copy link
Collaborator

@RedYetiDev RedYetiDev removed the commit-queue-failed An error occurred while landing this pull request using GitHub Actions. label Sep 1, 2024
@RedYetiDev
Copy link
Member

RedYetiDev commented Sep 1, 2024

Given the repeated activity since the failed commit, I assume it is known that the commit-queue originally failed, so I removed the label. Feel free to re-add.

@nodejs-github-bot
Copy link
Collaborator

@ronag ronag force-pushed the fast-utf8-2 branch 2 times, most recently from 164f7f2 to 34edef6 Compare September 2, 2024 05:06
@ronag ronag changed the title buffer: validate UTF8 on fast path buffer: re-enable Fast API for Buffer.write Sep 2, 2024
@RedYetiDev
Copy link
Member

Correct me if I'm wrong, but shouldn't the "PR-URL" meta be added by the bot?

@targos
Copy link
Member

targos commented Sep 2, 2024

Correct me if I'm wrong, but shouldn't the "PR-URL" meta be added by the bot?

You're right. @ronag please remove the PR-URL from the commit.

Re-enables fast Fast API for Buffer.write after fixing
UTF8 handling.

Fixes: nodejs#54521
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

lemire added a commit to simdutf/simdutf that referenced this pull request Sep 3, 2024
* feat: convert_latin1_to_utf8_s

Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length.

Refs: nodejs/node#54526

* various fixes

* [no-ci] just some code reformatting

---------

Co-authored-by: Robert Nagy <ronagy@icloud.com>
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

CI: https://ci.nodejs.org/job/node-test-pull-request/61926/

@anonrig anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label Sep 4, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Sep 4, 2024
@nodejs-github-bot nodejs-github-bot merged commit dc74f17 into nodejs:main Sep 4, 2024
54 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in dc74f17

aduh95 pushed a commit that referenced this pull request Sep 12, 2024
Re-enables fast Fast API for Buffer.write after fixing
UTF8 handling.

Fixes: #54521
PR-URL: #54526
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Paolo Insogna <paolo@cowtech.it>
@RafaelGSS RafaelGSS mentioned this pull request Sep 16, 2024
@targos targos added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.