Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: improve require() performance #10789

Merged
merged 10 commits into from
Mar 11, 2017
Merged

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Jan 13, 2017

Benchmark results for the changes in this PR:

                                                                        improvement significant      p.value
 module/module-loader.js useCache="false" fullPath="false" thousands=50      5.67 %         *** 8.756066e-18
 module/module-loader.js useCache="false" fullPath="true" thousands=50       6.33 %         *** 1.120897e-16
 module/module-loader.js useCache="true" fullPath="false" thousands=50     125.20 %         *** 4.165830e-37
 module/module-loader.js useCache="true" fullPath="true" thousands=50      131.80 %         *** 2.755969e-34

First off, I recognize the module module is "locked" so I understand there is a possibility only some or none of these commits may be accepted. Either way I have initially marked this as semver-major because of a couple of changes, if they end up not being an issue I will happily remove the semver-major label:

The first is mscdex/io.js@eb3b717 which changes the format of the key used in Module._pathCache. Instead of using JSON, a simple delimiter is used between the string values being used. This commit probably provides the single largest performance increase by itself, however I am not sure just how many people are making assumptions about the format of the cache keys themselves. A quick search on github at least reveals that there are people directly accessing Module._pathCache, but they seem to just be iterating over the properties and deleting them when they include some substring (typically a module name the user is wanting to "uncache", in addition to deleting from require.cache).

Secondly, there are the changes in mscdex/io.js@3d8528d, which may cause issues if anyone is both directly accessing any of those objects and using obj.hasOwnProperty().

All of the other changes should be backwards compatible AFAIK.

CI: https://ci.nodejs.org/job/node-test-pull-request/5845/
CITGM: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/523/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • benchmark
  • fs
  • module

@mscdex mscdex added module Issues and PRs related to the module subsystem. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. labels Jan 13, 2017
@nodejs-github-bot nodejs-github-bot added fs Issues and PRs related to the fs subsystem / file system. module Issues and PRs related to the module subsystem. repl Issues and PRs related to the REPL subsystem. labels Jan 13, 2017
@mscdex mscdex added the wip Issues and PRs that are still a work in progress. label Jan 13, 2017
@mscdex mscdex removed the wip Issues and PRs that are still a work in progress. label Jan 13, 2017
Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to approve yet until after we get some good smoketesting done on this but really happy to see this.

Module._extensions = {};
Module._cache = Object.create(null);
Module._pathCache = Object.create(null);
Module._extensions = Object.create(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, this is the only bit that may be concerning. I'm hugely +1 on making these changes, but there's obvious risk here. We'll want to smoke test this thoroughly /cc @nodejs/citgm

Copy link
Contributor Author

@mscdex mscdex Jan 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I started citgm, but it looks like there may be some infrastructure issues, since there was some trouble on downloading various npm packages (not always the same ones) on pretty much every citgm machine.

}
var cacheKey = request + '\x00' +
(paths.length === 1 ? paths[0] : paths.join('\x00'));
var entry = Module._pathCache[cacheKey];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad that we can't use the github emoji thumbsup on individual changes ;-)

@mscdex
Copy link
Contributor Author

mscdex commented Feb 17, 2017

Rebased.

CI again: https://ci.nodejs.org/job/node-test-pull-request/6464/

/cc @nodejs/ctc Any interest in this?

@targos targos self-assigned this Feb 17, 2017
@targos targos removed their assignment Feb 17, 2017
@targos
Copy link
Member

targos commented Feb 17, 2017

@mscdex
Copy link
Contributor Author

mscdex commented Feb 18, 2017

As far as I can tell, none of the failures in the CITGM runs seem to be related to the changes in this PR.

@gibfahn
Copy link
Member

gibfahn commented Feb 19, 2017

As far as I can tell, none of the failures in the CITGM runs seem to be related to the changes in this PR.

cc/ @nodejs/citgm

@mscdex
Copy link
Contributor Author

mscdex commented Feb 19, 2017

@gibfahn Some of the failures are due to the http outgoing._headers change, which should be resolved when #10941 and #10805 are landed.

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm LGTM on this as a semver major.

});
} else {
process.nextTick(LOOP);
}
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest how much perf gain did we get from this one ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the async realpath()? I didn't measure it explicitly because it's not used for module lookup, I merely made the changes to mirror the realpathSync() changes for consistency.

@ofrobots
Copy link
Contributor

This change will probably affect APM code that depend on the module loader internals for monkey patching. Paging @watson @hayes @matthewloring.

@mscdex
Copy link
Contributor Author

mscdex commented Feb 28, 2017

@ofrobots How? The new behavior for Module._resolveLookupPaths() is opt-in. The only problem would be if someone was actually replacing that function, which the code you linked to doesn't seem to be doing.

@ofrobots
Copy link
Contributor

My understanding was Module._resolveLookupPaths can now return null. The linked code wasn't expecting this. Note that I am not stating intent for or against this change, but I want to make sure this isn't a surprise for APM folks.

@mscdex
Copy link
Contributor Author

mscdex commented Feb 28, 2017

@ofrobots It will only return such new values if opting into such behavior by passing a 'true' value as a third argument (newReturn parameter) to Module._resolveLookupPaths().

@ofrobots
Copy link
Contributor

Ack. Again, just trying to increase visibility for APM folks (who aren't covered by CITGM) who might be subtly dependent (or not).

jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
nullCheck() implicitly converts the argument to string when checking
the value, so this commit avoids any unnecessary additional (Buffer)
conversions to string.

PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
Replacing the path separator-finding regexp with a custom function
results in a measurable improvement in performance.

PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
hasOwnProperty() is known to be slow, do a direct lookup on a "clean"
object instead.

PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
Using a more "direct" method of function calling yields better
performance.

PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
By avoiding JSON.stringify() and simply joining the strings with a
delimiter that does not appear in paths, we can improve cached
require() performance by at least 50%.

Additionally, this commit removes the last source of permanent
function deoptimization (const) for _findPath().

PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
This commit consists of two changes:

* Avoids returning request/id *just* for the debug() output
* Returns `null` instead of an empty array for the list of paths

PR-URL: nodejs#10789
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
@jasnell jasnell mentioned this pull request Apr 4, 2017
@mscdex mscdex mentioned this pull request Apr 24, 2017
3 tasks
@benjamingr
Copy link
Member

Hey, this broke Ghost - can someone from @nodejs/citgm tell me how come CITGM didn't catch it? (Maybe Ghost isn't that popular - but it broke https://www.npmjs.com/package/require-dir with 600k downloads).

@targos
Copy link
Member

targos commented Jul 10, 2017

@benjamingr require-dir is not directly tested in CITGM. Can you be more specific about what was broken?

@TimothyGu
Copy link
Member

@targos aseemk/requireDir#45

@benjamingr
Copy link
Member

@targos sure, require.extensions.hasOwnPropertyNames is used in require-dir which is itself used in some pretty popular packages.

I think we could have done a better job in communicating these changes to require-dir (or waiting with them and showing a deprecation warning for one version).

@mscdex
Copy link
Contributor Author

mscdex commented Jul 10, 2017

@benjamingr This was already fixed in require-dir earlier this year. Ghost just needs to update its dependency or require-dir needs to publish a new version.

@mscdex
Copy link
Contributor Author

mscdex commented Jul 10, 2017

@benjamingr Besides, you cannot really deprecate the change made to require.extensions.

@benjamingr
Copy link
Member

@mscdex allow me to clarify - this is about what could have been done better on our side - not about userland being weird.

When I saw (and signed off on) this change - I saw the comment by @jasnell and your discussion about the Object.create(null) change. I'm wondering what I could have done better as a reviewer.

The conclusion might be to add more packages to CITGM, or to add deprecation warnings in the future before these sort of changes , or to guide users to not rely on prototypes of objects used as maps in the docs.

I'm not sure what we should have done differently - but in practice I'm experiencing a hard time from a user point of view (upgrading a system from Node 4 to 8) which I did not anticipate as a project member. According to the issue @TimothyGu linked to this happened to several users and not just me.

This is more about how we handle future changes - if you'd prefer it if I took these to the CTC (or main) repo for that sort of discussion that's also fine :)

I'd rather focus on how to prevent these sort of breakages in the future rather than the specific issue here - the Object.create part in this PR was small in a PR full of improvements and we missed it - I'm not sure what you as PR author could have done differently in this case to be honest, mostly - I think CITGM could have caught it.

@targos
Copy link
Member

targos commented Jul 10, 2017

I think we could have done a better job in communicating these changes to require-dir

We couldn't have done that because we weren't aware that require-dir was affected.

@benjamingr
Copy link
Member

@targos I realize that :)

Is there a process to add a package to CITGM or affect which packages are there?

I think if we go through packages that are popular but not dependencies of other popular packages - we should find a bunch of useful packages to add to CITGM (Ghost being an example).

@targos
Copy link
Member

targos commented Jul 10, 2017

https://github.com/nodejs/citgm/blob/master/CONTRIBUTING.md#submitting-a-module-to-citgm

Edit: the packages are in https://github.com/nodejs/citgm/blob/master/lib/lookup.json

@benjamingr
Copy link
Member

@targos Thanks! As a preference of citgm - in this case - would I add require-dir or Ghost?

@targos
Copy link
Member

targos commented Jul 10, 2017

I think require-dir would be a very good addition. It has 769 dependent packages.

@mcollina
Copy link
Member

@benjamingr It is really hard for anyone to understand the impact that it will have on the ecosystem. This is unfortunate, and I do not think there is a true solution, as the ecosystem is growing and more packages are added. We can protect only on things we broke in the past by adding them to citgm.

On this specific issue, Ghost should have updated its own dependencies, as this was already fixed. I do not see how adding require-dir to CITGM fixed this problem, as Ghost depended on an older version. I think adding Ghost would be very hard, so the best approach is to enable greenkeeper or something similar there.

@benjamingr
Copy link
Member

@mcollina

It is really hard for anyone to understand the impact that it will have on the ecosystem. This is unfortunate, and I do not think there is a true solution, as the ecosystem is growing and more packages are added. We can protect only on things we broke in the past by adding them to citgm.

(Note in the specific case require-dir fixed this after we broke it)

I agree that there is no way we can solve this for the general case - but if we had a downloads/dependents threshold for automatically suggesting CITGM packages I think it would go a long way towards ensuring backwards compatibility or at least give the community time to adapt.

@gibfahn
Copy link
Member

gibfahn commented Jul 12, 2017

I agree that there is no way we can solve this for the general case - but if we had a downloads/dependents threshold for automatically suggesting CITGM packages I think it would go a long way towards ensuring backwards compatibility or at least give the community time to adapt.

We currently test ~100 modules, I suspect to make this worthwhile we'd need to be testing a larger percentage of the hundreds of thousands of modules on npm. The limiting factor isn't (yet) the time taken to run on our platforms, it's the manual work needed to triage test failures.

@benjamingr
Copy link
Member

@gibfahn I'll try and get this issue some warmth and love on the 19th when I'm onboarding first time contributors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. module Issues and PRs related to the module subsystem. performance Issues and PRs related to the performance of Node.js. repl Issues and PRs related to the REPL subsystem. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.