src: add require transform pipeline #12349

Qard · 2017-04-11T23:40:37Z

This feature simply passes module text content and filename through a transformer function which could be used for things like applying AST transforms to the source before it is compiled.

This is just an idea I'm playing with, I could use some feedback. Is this something people want? Is this a reasonable approach?

The ability to intercept the text content would make it easier to do several things like transpiling code, injecting code coverage hooks or applying custom instrumentation.

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

src

This feature simply passes the content and filename through a transformer function which could be used for things like applying AST transforms to the source before it is run.

refack · 2017-04-12T00:40:32Z

lib/internal/bootstrap_node.js

@@ -474,6 +478,7 @@
  }

  NativeModule._source = process.binding('natives');
+  NativeModule.preloadModuleWrap = null;


Personally I would set it to a null transform a => a and remove the if in line 552.
I like polymorphism 😊

I just went with the null so there's no extra call in the vast majority of cases when it's not in-use. Probably not a big deal either way though.

refack · 2017-04-12T00:41:08Z

lib/module.js

@@ -537,6 +537,10 @@ var resolvedArgv;
 // the file.
 // Returns exception, if any.
 Module.prototype._compile = function(content, filename) {
+  if (NativeModule.preloadModuleWrap !== null) {


refack · 2017-04-12T00:47:30Z

I like this "middleware" approach, it's what made express so powerful. It it's very elegant, and makes a lot of existing use cases cleaner; obviously transpilers like babel but also mocha with it's bdd interface, and other AOP modules.

refack · 2017-04-12T01:05:35Z

Or maybe it should be eventemitter i.e. on('moduleLoading') on('moduleLoaded'), but I'm not sure since require and in the future import are synchronous.

sam-github

lacks docs so hard to understand how this would be used

lacks motivation in the PR description, some examples of how it would be used, and why this way is better would be helpful

Qard · 2017-04-12T21:27:46Z

Yep, no docs yet because:

I'm more just looking for feedback on the idea currently.
I'm not sure if it'd be something we'd want to document.

For an example, there's https://github.com/nodejs/node/pull/12349/files#diff-38ba9d7bed72af8741677acd3cb7ec2c which adds an extra line to the text before it gets parsed. More advanced uses would be to feed it into an AST parser and make modifications to inject code coverage or instrumentation hooks, then regenerating the code text to pass along to the parser.

As for why it's better: currently transpilers, code coverage tools and almost every APM agent makes extensive patches to the require function. APM is especially problematic because of the need for instrumenting core. Currently most APM providers monkey-patch everything at runtime, which can be incredibly fragile. If one wanted to try AST transform based instrumentation on userland modules they could patch module.constructor.prototype._compile (Not advisable due to being a private API) but that is not used with built-in modules--there's currently no way to apply pre-parse modifications to code text of built-in modules.

sam-github · 2017-04-13T15:30:55Z

I'm more just looking for feedback on the idea currently.

I totally understand the desire to avoid docing something that might not happen, but in the absence of docs, its also hard to get feedback.

refack · 2017-04-13T16:51:13Z

This implementation doesn't do anything to support multiple transforms

A good reason to use on('moduleLoading')

Qard · 2017-04-13T17:04:24Z

@sam-github Fair point. I'll write something up for it today.

@refack An event emitter would likely just complicate things. You can't return from an emitter, so it'd need separate channel to propagate the changed values back and I feel like that'd just get messy fast.

Qard · 2017-04-13T17:46:57Z

@sam-github I've added some documentation, if you'd like to have another look.

Qard · 2017-04-13T20:32:20Z

Hmm...I'm wondering if it might be a good balance of performance and flexibility to instead have require.addTransform(fn) and require.removeTransform(fn) to manage an array of functions with the same (content, filename) signature and then just apply them internally like this:

content = transforms.reduce((content, transform) => {
  return transform(content, filename)
}, content)

refack · 2017-04-13T20:35:55Z

@refack An event emitter would likely just complicate things. You can't return from an emitter, so it'd need separate channel to propagate the changed values back and I feel like that'd just get messy fast.

ironically EventEmmiter is synchronous https://nodejs.org/api/events.html#events_asynchronous_vs_synchronous

so

var e = {filename, source}
this.emit('moduleLoading', e)
var ret = e.source //works

Qard · 2017-04-13T20:42:16Z

Yep, I'm aware. I hadn't thought of the object modifying approach...modifying an object in code that doesn't own it seems a bit questionable to me though, to be honest. 😟

refack · 2017-04-13T20:52:41Z

It's not a trick, it's a valid and common pattern, like express's middleware, or DOM events (e.stopPropogation() or BeforeUnloadEvent) where you explicitly declate some properties in/out and some readonly (AudioProcessingEvent), that's way it's synchronous.

Qard · 2017-04-13T22:22:47Z

I refactored to use require.addTransform(transform) and require.removeTransform(transform) functions, which should make it more user-friendly without sacrificing much in terms of performance. I also updated the docs to match the refactor.

How does it look?

Qard · 2017-04-13T22:24:43Z

@nodejs/diagnostics

AndreasMadsen · 2017-04-13T22:38:30Z

If I remember correctly Module._extensions was a long time ago public API, why was it made private? This feature looks very similar but perhaps less flexible.

Qard · 2017-04-13T22:46:03Z

I don't think it was ever possible to patch built-in modules with Module._extensions though.

As for why it was made private, I don't think it was ever strictly intended to be "public" in the first place. It just happened to not use an underscore in the early days of node, so people just assumed it was fair game to mess with it.

Also, Module._extensions doesn't receive the code text, so it's never really been possible to modify the pre-compile content without patching the _compile function, which was probably never a good idea, it being even more certainly private...

refack · 2017-04-14T01:06:40Z

lib/internal/module.js

+    return next;
+  }, content);
+}
+


So you reimplemented EventEmmiter only with an explicit return. Do you really feel it's worth the duplication?
You could instead just wrap it, and win!

var ee = new EventEmitter(); var tMap = new WeakMap(); function applyTransforms(content, filename) { var e = {content, filename}; ee.emit('moduleLoading', e); return e.content; } function addTransform(transform) { var t = (e) => {e.content = transform(e.content, e.filename)}; tmap.add(transform, t); ee.addListener('moduleLoading', t) } function removeTransform(transform) { ee.removeListener('moduleLoading', tMap.get(transform))}

Event emitters are much bigger and more complicated than this needs.

That event emitter implementation has a lot more closures and object allocations. Event emitters also inherently have a much more complicated lookup scheme.

As require is a major startup hot-path, it's best not to put event emitters in the middle of that. Keep in mind this code would typically run many thousands of times during app startup.

premature optimization is the root of all evil...
IMHO code duplication is worse than a minor performance impact. I'm sure the actual compile is orders of magnitude heavier.
Also EE are super fast when there are no listeners.
Let's test and see...

Certainly code duplication is a thing to avoid generally, but the code to wrap an event emitter around it has just as much surface area as the custom implementation itself, without even taking into account all the extra stuff running inside the event emitter implementation. It's also a lot more difficult for someone unfamiliar with the code to understand at first glance. Were this something more complicated, an event emitter implementation would definitely make more sense, but it has little advantage here.

Also, the event emitter implementation is 2.5-3 times slower and has 11 times the memory footprint. It's not premature optimization, it's carefully considered and diligently measured optimization. 😉

Empirical measurement wins (even named my first company Empeeric)!

Changes were discussed, but PR is not yet complete...

Qard · 2017-04-14T04:07:54Z

I just wanted to elaborate a bit on my own personal reasoning for this PR:

My use for this is as an alternative to monkey-patch-based performance monitoring instrumentation. Typically in a performance monitoring agent, it will apply instrumentation by wrapping functions in closures that add behaviour before running the original function and before running the original callback. This creates many, many closures for every function that is wrapped. In a heavily instrumented environment, the extra performance impact at high load can become problematic.

There's also a lot of risk in wrapping code at runtime due to the potential for shifting, added or removed parameters, function.length checks, accidentally triggering getters/setters, inability to reliably inspect internal behaviour programmatically to inform decisions on where to place instrumentation hooks, etc.

My hope is to provide a very usable alternative to the typical monkey-patching style through AST manipulation. This would eliminate most code behaviour quirks and would remove all those closures, taking a lot of extra indirection out of potentially numerous hot-paths in the execution of a user's app. This would enable much safer and lower impact real-time production monitoring.

Qard · 2017-04-15T17:54:32Z

@thefourtheye Oh, good catch. I'll see if I can find some time to fix it later tonight (visiting family for Easter right now). Any thoughts on the concept though?

Qard · 2017-04-15T22:00:43Z

Fixed the missing semi-colons and type info in the docs.

TimothyGu · 2017-04-15T22:10:47Z

lib/internal/module.js

+const transforms = [];
+
+function addTransform(transform) {
+  transforms.push(transform);


Assert that transform is a function here.

TimothyGu · 2017-04-15T22:11:23Z

doc/api/globals.md

@@ -245,6 +245,55 @@ added: v0.3.0
 Use the internal `require()` machinery to look up the location of a module,
 but rather than loading the module, just return the resolved filename.

+### require.addTransform(transform)
+
+* `transform` {function} A function to use to transform module text given the


{Function}

TimothyGu · 2017-04-15T22:11:31Z

doc/api/globals.md

+
+### require.removeTransform(transform)
+
+* `transform` {function} A function previously given to `require.addTransform()`


TimothyGu · 2017-04-15T22:12:34Z

lib/internal/module.js

+  return transforms.reduce((content, transform) => {
+    const next = transform(content, filename);
+    if (typeof next !== 'string') {
+      throw new Error('Module transforms must return the modified content');


Is there a way to make it known which module transform is faulty?

Could always attach the failing function to the error object before throwing.

TimothyGu · 2017-04-15T22:17:42Z

I'm okay with the idea, and I recall some third-party module on npm doing something similar. However:

Should we support source maps?
How does this interact with existing module._extensions-based addons?
How does this work with non-.js files?

Qard · 2017-04-15T23:07:07Z

My feeling is source maps should probably be left to userland, but I haven't put much thought into that.
and 3. As this is triggered by module._compile(), anything that calls that should work. Currently that is only the .js extension and so supporting other extensions would require either copying the .js extension handler or writing your own which calls module._compile() internally. (Which is not ideal, being a private method...)

Perhaps there should be some way to alter the extension too, which could change the behaviour of the pipeline? Not sure how that'd work exactly...

This is a simple pipeline to enable applying transform functions to loaded JavaScript files for various purposes such as transpiling, creating test mocks, recording code coverage or applying custom instrumentation.

azu · 2017-04-16T00:45:14Z

doc/api/globals.md

+
+
+```js
+require.addTransform((content, filepath) => {


filename?

pmuellr · 2017-04-17T12:55:35Z

Should we support source maps?

Seems like it would "just work" if the transforms did all the work themselves - both generating the sourcemap (obviously), but also reading and interpreting any incoming sourcemaps. The onus is on the transform to do generate the sourcemap data anyway; it couldn't be done outside the transform.

I could imagine a couple of ways sourcemap handling might be made easier for transforms, but seems pretty clear we'd have to play with this quite a bit to figure out how to do it right in practice. Eg, I'd bet you don't want "full" sourcemap support in production, but might want enough for stack trace generation.

But as a general "extensibility" concern, I wonder if the incoming parameters and outgoing response of the transform should be ... more extendable. Eg, have the transform return {content: transformedContent} to allow for more things to be returned later.

evanlucas

I'm really excited about this. Hopefully, we can add it without any performance impact

evanlucas · 2017-04-17T22:39:36Z

lib/module.js

@@ -537,6 +537,8 @@ var resolvedArgv;
 // the file.
 // Returns exception, if any.
 Module.prototype._compile = function(content, filename) {
+  content = internalModule.applyTransforms(content, filename);


I wonder what the performance impact is here.

evanlucas · 2017-04-17T22:44:43Z

doc/api/globals.md

@@ -245,6 +245,49 @@ added: v0.3.0
 Use the internal `require()` machinery to look up the location of a module,
 but rather than loading the module, just return the resolved filename.

+### require.addTransform(transform)


Does this work on builtin modules?

jasnell · 2017-04-18T13:34:10Z

I can't really say that I'm a fan of this, largely because I do not see why it needs to be in core. Perhaps I just need to understand the use cases more.

There are also questions about how this would interplay with ES6 modules, debugging, monkey patching, and so on.

An EPS might be a good way to describe the feature and use cases in depth while you're working on this, so that we can better understand it.

mhdawson · 2017-04-20T15:11:18Z

I had the same thought that an eps would be a good way do doc/explain why it need to be in core.

Having said that, I also understand the desire to get feedback before deciding to do that and code is often the easiest way to do that. So just to say if you do want to propose adding it, its a significant enough addition to warrant an eps but it is interesting discuss in advance as well.

What you have looks interesting to me...

Qard · 2017-04-20T16:31:28Z

Yep, I agree with the EPS suggestion. I just wasn't sure yet exactly what the shape of a proposal would be, so I put together some code that solves my particular need as a proof-of-concept. I don't mind if this PR gets rejected, just wanted to spark some conversation. 👍

Trott · 2017-05-01T14:05:48Z

Since this has been referred to the EP process, I'm going to remove the ctc-review label.

BridgeAR · 2017-09-12T20:14:26Z

@Qard I know that this is blocked by the EPS but as there is no progress in that thread for a long time - is this something you still want to follow up on?

Qard · 2017-09-12T20:38:26Z

I still want the feature myself, but it doesn't seem like there's much interest in it. Maybe worth reevaluating after ESM has been out for a bit.

BridgeAR · 2017-09-20T00:31:07Z

@Qard ok, I am not certain what we should do with the PR in the meanwhile. Would you want to keep it open or is it fine to close it?

Qard · 2017-09-20T00:49:17Z

Closing. I'll re-evaluate later to see how it applies with ESM stuff now existing.

src: let preloaded modules set source transformer

f0b9f42

This feature simply passes the content and filename through a transformer function which could be used for things like applying AST transforms to the source before it is run.

nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. labels Apr 11, 2017

refack reviewed Apr 12, 2017

View reviewed changes

refack added the ctc-review label Apr 12, 2017

jasnell added the wip Issues and PRs that are still a work in progress. label Apr 12, 2017

sam-github suggested changes Apr 12, 2017

View reviewed changes

doc: Document process.moduleWrap()

6b6ca10

Qard force-pushed the preload-module-extras branch from 3fd4099 to 5df01fe Compare April 13, 2017 22:18

Qard changed the title ~~src: let preloaded modules set source transformer~~ src: add require transform pipeline Apr 13, 2017

refack previously requested changes Apr 14, 2017

View reviewed changes

refack force-pushed the master branch from 16073c0 to fbe946b Compare April 14, 2017 04:12

Qard force-pushed the preload-module-extras branch from 5df01fe to 22e91bc Compare April 15, 2017 21:58

TimothyGu reviewed Apr 15, 2017

View reviewed changes

Qard force-pushed the preload-module-extras branch from 22e91bc to cc6083d Compare April 15, 2017 23:09

src: add require transform pipeline

7f3a29a

This is a simple pipeline to enable applying transform functions to loaded JavaScript files for various purposes such as transpiling, creating test mocks, recording code coverage or applying custom instrumentation.

Qard force-pushed the preload-module-extras branch from cc6083d to 7f3a29a Compare April 15, 2017 23:10

azu reviewed Apr 16, 2017

View reviewed changes

doc/api/globals.md

```js

require.addTransform((content, filepath) => {

Copy link

azu Apr 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename?

Trott mentioned this pull request Apr 17, 2017

Node.js Foundation Core Technical Committee (CTC) Meeting 2017-04-19 nodejs/CTC#105

Closed

refack mentioned this pull request Apr 17, 2017

build: allow easier checking of permanent deoptimizations #12456

Merged

3 tasks

evanlucas reviewed Apr 18, 2017

View reviewed changes

Trott mentioned this pull request Apr 24, 2017

Node.js Foundation Core Technical Committee (CTC) Meeting 2017-04-26 nodejs/CTC#111

Closed

Trott mentioned this pull request May 1, 2017

Node.js Foundation Core Technical Committee (CTC) Meeting 2017-05-03 nodejs/CTC#117

Closed

Trott removed the ctc-review label May 1, 2017

Qard mentioned this pull request May 6, 2017

proposal: Staged require() with lifecycle hooks nodejs/node-eps#56

Closed

refack mentioned this pull request Aug 20, 2017

Reduce pseudo internal Node APIs standard-things/esm#66

Closed

BridgeAR added the stalled Issues and PRs that are stalled. label Sep 12, 2017

Qard closed this Sep 20, 2017

Qard deleted the preload-module-extras branch August 11, 2021 06:31


		### require.removeTransform(transform)

		* `transform` {function} A function previously given to `require.addTransform()`

src: add require transform pipeline #12349

src: add require transform pipeline #12349

Conversation

Qard commented Apr 11, 2017 • edited Loading

Checklist

Affected core subsystem(s)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refack commented Apr 12, 2017 • edited Loading

refack commented Apr 12, 2017

sam-github left a comment

Choose a reason for hiding this comment

Qard commented Apr 12, 2017 • edited Loading

sam-github commented Apr 13, 2017

refack commented Apr 13, 2017

Qard commented Apr 13, 2017 • edited Loading

Qard commented Apr 13, 2017

Qard commented Apr 13, 2017 • edited Loading

refack commented Apr 13, 2017

Qard commented Apr 13, 2017 • edited Loading

refack commented Apr 13, 2017

Qard commented Apr 13, 2017

Qard commented Apr 13, 2017

AndreasMadsen commented Apr 13, 2017 • edited Loading

Qard commented Apr 13, 2017 • edited Loading

Choose a reason for hiding this comment

Qard Apr 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qard Apr 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qard commented Apr 14, 2017

Qard commented Apr 15, 2017

Qard commented Apr 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu commented Apr 15, 2017

Qard commented Apr 15, 2017 • edited Loading

Choose a reason for hiding this comment

pmuellr commented Apr 17, 2017

evanlucas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasnell commented Apr 18, 2017

mhdawson commented Apr 20, 2017

Qard commented Apr 20, 2017

Trott commented May 1, 2017

BridgeAR commented Sep 12, 2017

Qard commented Sep 12, 2017

BridgeAR commented Sep 20, 2017

Qard commented Sep 20, 2017

Qard commented Apr 11, 2017 •

edited

Loading

refack commented Apr 12, 2017 •

edited

Loading

Qard commented Apr 12, 2017 •

edited

Loading

Qard commented Apr 13, 2017 •

edited

Loading

Qard commented Apr 13, 2017 •

edited

Loading

Qard commented Apr 13, 2017 •

edited

Loading

AndreasMadsen commented Apr 13, 2017 •

edited

Loading

Qard commented Apr 13, 2017 •

edited

Loading

Qard Apr 14, 2017 •

edited

Loading

Qard Apr 14, 2017 •

edited

Loading

Qard commented Apr 15, 2017 •

edited

Loading