Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: add require transform pipeline #12349

Closed
wants to merge 3 commits into from

Conversation

Qard
Copy link
Member

@Qard Qard commented Apr 11, 2017

This feature simply passes module text content and filename through a transformer function which could be used for things like applying AST transforms to the source before it is compiled.

This is just an idea I'm playing with, I could use some feedback. Is this something people want? Is this a reasonable approach?

The ability to intercept the text content would make it easier to do several things like transpiling code, injecting code coverage hooks or applying custom instrumentation.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

src

This feature simply passes the content and filename through
a transformer function which could be used for things like
applying AST transforms to the source before it is run.
@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. labels Apr 11, 2017
@@ -474,6 +478,7 @@
}

NativeModule._source = process.binding('natives');
NativeModule.preloadModuleWrap = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Personally I would set it to a null transform a => a and remove the if in line 552.
    I like polymorphism 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went with the null so there's no extra call in the vast majority of cases when it's not in-use. Probably not a big deal either way though.

lib/module.js Outdated
@@ -537,6 +537,10 @@ var resolvedArgv;
// the file.
// Returns exception, if any.
Module.prototype._compile = function(content, filename) {
if (NativeModule.preloadModuleWrap !== null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 1

@refack
Copy link
Contributor

refack commented Apr 12, 2017

I like this "middleware" approach, it's what made express so powerful. It it's very elegant, and makes a lot of existing use cases cleaner; obviously transpilers like babel but also mocha with it's bdd interface, and other AOP modules.

@refack
Copy link
Contributor

refack commented Apr 12, 2017

Or maybe it should be eventemitter i.e. on('moduleLoading') on('moduleLoaded'), but I'm not sure since require and in the future import are synchronous.

@jasnell jasnell added the wip Issues and PRs that are still a work in progress. label Apr 12, 2017
Copy link
Contributor

@sam-github sam-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lacks docs so hard to understand how this would be used

lacks motivation in the PR description, some examples of how it would be used, and why this way is better would be helpful

@Qard
Copy link
Member Author

Qard commented Apr 12, 2017

Yep, no docs yet because:

  1. I'm more just looking for feedback on the idea currently.
  2. I'm not sure if it'd be something we'd want to document.

For an example, there's https://github.com/nodejs/node/pull/12349/files#diff-38ba9d7bed72af8741677acd3cb7ec2c which adds an extra line to the text before it gets parsed. More advanced uses would be to feed it into an AST parser and make modifications to inject code coverage or instrumentation hooks, then regenerating the code text to pass along to the parser.

As for why it's better: currently transpilers, code coverage tools and almost every APM agent makes extensive patches to the require function. APM is especially problematic because of the need for instrumenting core. Currently most APM providers monkey-patch everything at runtime, which can be incredibly fragile. If one wanted to try AST transform based instrumentation on userland modules they could patch module.constructor.prototype._compile (Not advisable due to being a private API) but that is not used with built-in modules--there's currently no way to apply pre-parse modifications to code text of built-in modules.

@sam-github
Copy link
Contributor

I'm more just looking for feedback on the idea currently.

I totally understand the desire to avoid docing something that might not happen, but in the absence of docs, its also hard to get feedback.

@refack
Copy link
Contributor

refack commented Apr 13, 2017

This implementation doesn't do anything to support multiple transforms

A good reason to use on('moduleLoading')

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

@sam-github Fair point. I'll write something up for it today.

@refack An event emitter would likely just complicate things. You can't return from an emitter, so it'd need separate channel to propagate the changed values back and I feel like that'd just get messy fast.

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

@sam-github I've added some documentation, if you'd like to have another look.

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

Hmm...I'm wondering if it might be a good balance of performance and flexibility to instead have require.addTransform(fn) and require.removeTransform(fn) to manage an array of functions with the same (content, filename) signature and then just apply them internally like this:

content = transforms.reduce((content, transform) => {
  return transform(content, filename)
}, content)

@refack
Copy link
Contributor

refack commented Apr 13, 2017

@refack An event emitter would likely just complicate things. You can't return from an emitter, so it'd need separate channel to propagate the changed values back and I feel like that'd just get messy fast.

ironically EventEmmiter is synchronous https://nodejs.org/api/events.html#events_asynchronous_vs_synchronous

so

var e = {filename, source}
this.emit('moduleLoading', e)
var ret = e.source //works

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

Yep, I'm aware. I hadn't thought of the object modifying approach...modifying an object in code that doesn't own it seems a bit questionable to me though, to be honest. 😟

@refack
Copy link
Contributor

refack commented Apr 13, 2017

It's not a trick, it's a valid and common pattern, like express's middleware, or DOM events (e.stopPropogation() or BeforeUnloadEvent) where you explicitly declate some properties in/out and some readonly (AudioProcessingEvent), that's way it's synchronous.

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

I refactored to use require.addTransform(transform) and require.removeTransform(transform) functions, which should make it more user-friendly without sacrificing much in terms of performance. I also updated the docs to match the refactor.

How does it look?

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

@nodejs/diagnostics

@Qard Qard changed the title src: let preloaded modules set source transformer src: add require transform pipeline Apr 13, 2017
@AndreasMadsen
Copy link
Member

AndreasMadsen commented Apr 13, 2017

If I remember correctly Module._extensions was a long time ago public API, why was it made private? This feature looks very similar but perhaps less flexible.

@Qard
Copy link
Member Author

Qard commented Apr 13, 2017

I don't think it was ever possible to patch built-in modules with Module._extensions though.

As for why it was made private, I don't think it was ever strictly intended to be "public" in the first place. It just happened to not use an underscore in the early days of node, so people just assumed it was fair game to mess with it.

Also, Module._extensions doesn't receive the code text, so it's never really been possible to modify the pre-compile content without patching the _compile function, which was probably never a good idea, it being even more certainly private...

refack
refack previously requested changes Apr 14, 2017
return next;
}, content);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you reimplemented EventEmmiter only with an explicit return. Do you really feel it's worth the duplication?
You could instead just wrap it, and win!

var ee = new EventEmitter();
var tMap = new WeakMap();
function applyTransforms(content, filename) {
  var e = {content, filename};
  ee.emit('moduleLoading', e);
  return e.content;
}
function addTransform(transform) {
  var t = (e) => {e.content = transform(e.content, e.filename)};
  tmap.add(transform, t);
  ee.addListener('moduleLoading', t)
}
function removeTransform(transform) { ee.removeListener('moduleLoading', tMap.get(transform))}

Copy link
Member Author

@Qard Qard Apr 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event emitters are much bigger and more complicated than this needs.

That event emitter implementation has a lot more closures and object allocations. Event emitters also inherently have a much more complicated lookup scheme.

As require is a major startup hot-path, it's best not to put event emitters in the middle of that. Keep in mind this code would typically run many thousands of times during app startup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

premature optimization is the root of all evil...
IMHO code duplication is worse than a minor performance impact. I'm sure the actual compile is orders of magnitude heavier.
Also EE are super fast when there are no listeners.
Let's test and see...

Copy link
Member Author

@Qard Qard Apr 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly code duplication is a thing to avoid generally, but the code to wrap an event emitter around it has just as much surface area as the custom implementation itself, without even taking into account all the extra stuff running inside the event emitter implementation. It's also a lot more difficult for someone unfamiliar with the code to understand at first glance. Were this something more complicated, an event emitter implementation would definitely make more sense, but it has little advantage here.

Also, the event emitter implementation is 2.5-3 times slower and has 11 times the memory footprint. It's not premature optimization, it's carefully considered and diligently measured optimization. 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirical measurement wins (even named my first company Empeeric)!

@refack refack dismissed their stale review April 14, 2017 03:51

Changes were discussed, but PR is not yet complete...

@Qard
Copy link
Member Author

Qard commented Apr 14, 2017

I just wanted to elaborate a bit on my own personal reasoning for this PR:

My use for this is as an alternative to monkey-patch-based performance monitoring instrumentation. Typically in a performance monitoring agent, it will apply instrumentation by wrapping functions in closures that add behaviour before running the original function and before running the original callback. This creates many, many closures for every function that is wrapped. In a heavily instrumented environment, the extra performance impact at high load can become problematic.

There's also a lot of risk in wrapping code at runtime due to the potential for shifting, added or removed parameters, function.length checks, accidentally triggering getters/setters, inability to reliably inspect internal behaviour programmatically to inform decisions on where to place instrumentation hooks, etc.

My hope is to provide a very usable alternative to the typical monkey-patching style through AST manipulation. This would eliminate most code behaviour quirks and would remove all those closures, taking a lot of extra indirection out of potentially numerous hot-paths in the execution of a user's app. This would enable much safer and lower impact real-time production monitoring.

@Qard
Copy link
Member Author

Qard commented Apr 15, 2017

@thefourtheye Oh, good catch. I'll see if I can find some time to fix it later tonight (visiting family for Easter right now). Any thoughts on the concept though?

@Qard
Copy link
Member Author

Qard commented Apr 15, 2017

Fixed the missing semi-colons and type info in the docs.

const transforms = [];

function addTransform(transform) {
transforms.push(transform);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that transform is a function here.

@@ -245,6 +245,55 @@ added: v0.3.0
Use the internal `require()` machinery to look up the location of a module,
but rather than loading the module, just return the resolved filename.

### require.addTransform(transform)

* `transform` {function} A function to use to transform module text given the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{Function}


### require.removeTransform(transform)

* `transform` {function} A function previously given to `require.addTransform()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

return transforms.reduce((content, transform) => {
const next = transform(content, filename);
if (typeof next !== 'string') {
throw new Error('Module transforms must return the modified content');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make it known which module transform is faulty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could always attach the failing function to the error object before throwing.

@TimothyGu
Copy link
Member

I'm okay with the idea, and I recall some third-party module on npm doing something similar. However:

  1. Should we support source maps?
  2. How does this interact with existing module._extensions-based addons?
  3. How does this work with non-.js files?

@Qard
Copy link
Member Author

Qard commented Apr 15, 2017

  1. My feeling is source maps should probably be left to userland, but I haven't put much thought into that.
  2. and 3. As this is triggered by module._compile(), anything that calls that should work. Currently that is only the .js extension and so supporting other extensions would require either copying the .js extension handler or writing your own which calls module._compile() internally. (Which is not ideal, being a private method...)

Perhaps there should be some way to alter the extension too, which could change the behaviour of the pipeline? Not sure how that'd work exactly...

This is a simple pipeline to enable applying transform
functions to loaded JavaScript files for various purposes
such as transpiling, creating test mocks, recording code
coverage or applying custom instrumentation.


```js
require.addTransform((content, filepath) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename?

@pmuellr
Copy link

pmuellr commented Apr 17, 2017

Should we support source maps?

Seems like it would "just work" if the transforms did all the work themselves - both generating the sourcemap (obviously), but also reading and interpreting any incoming sourcemaps. The onus is on the transform to do generate the sourcemap data anyway; it couldn't be done outside the transform.

I could imagine a couple of ways sourcemap handling might be made easier for transforms, but seems pretty clear we'd have to play with this quite a bit to figure out how to do it right in practice. Eg, I'd bet you don't want "full" sourcemap support in production, but might want enough for stack trace generation.

But as a general "extensibility" concern, I wonder if the incoming parameters and outgoing response of the transform should be ... more extendable. Eg, have the transform return {content: transformedContent} to allow for more things to be returned later.

Copy link
Contributor

@evanlucas evanlucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited about this. Hopefully, we can add it without any performance impact

@@ -537,6 +537,8 @@ var resolvedArgv;
// the file.
// Returns exception, if any.
Module.prototype._compile = function(content, filename) {
content = internalModule.applyTransforms(content, filename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the performance impact is here.

@@ -245,6 +245,49 @@ added: v0.3.0
Use the internal `require()` machinery to look up the location of a module,
but rather than loading the module, just return the resolved filename.

### require.addTransform(transform)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on builtin modules?

@jasnell
Copy link
Member

jasnell commented Apr 18, 2017

I can't really say that I'm a fan of this, largely because I do not see why it needs to be in core. Perhaps I just need to understand the use cases more.

There are also questions about how this would interplay with ES6 modules, debugging, monkey patching, and so on.

An EPS might be a good way to describe the feature and use cases in depth while you're working on this, so that we can better understand it.

@mhdawson
Copy link
Member

I had the same thought that an eps would be a good way do doc/explain why it need to be in core.

Having said that, I also understand the desire to get feedback before deciding to do that and code is often the easiest way to do that. So just to say if you do want to propose adding it, its a significant enough addition to warrant an eps but it is interesting discuss in advance as well.

What you have looks interesting to me...

@Qard
Copy link
Member Author

Qard commented Apr 20, 2017

Yep, I agree with the EPS suggestion. I just wasn't sure yet exactly what the shape of a proposal would be, so I put together some code that solves my particular need as a proof-of-concept. I don't mind if this PR gets rejected, just wanted to spark some conversation. 👍

@Trott
Copy link
Member

Trott commented May 1, 2017

Since this has been referred to the EP process, I'm going to remove the ctc-review label.

@BridgeAR
Copy link
Member

@Qard I know that this is blocked by the EPS but as there is no progress in that thread for a long time - is this something you still want to follow up on?

@Qard
Copy link
Member Author

Qard commented Sep 12, 2017

I still want the feature myself, but it doesn't seem like there's much interest in it. Maybe worth reevaluating after ESM has been out for a bit.

@BridgeAR
Copy link
Member

@Qard ok, I am not certain what we should do with the PR in the meanwhile. Would you want to keep it open or is it fine to close it?

@Qard
Copy link
Member Author

Qard commented Sep 20, 2017

Closing. I'll re-evaluate later to see how it applies with ESM stuff now existing.

@Qard Qard closed this Sep 20, 2017
@Qard Qard deleted the preload-module-extras branch August 11, 2021 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. stalled Issues and PRs that are stalled. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.