Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method for nodes to register dependencies programmatically #26

Open
Rich-Harris opened this issue Dec 9, 2014 · 10 comments
Open

Method for nodes to register dependencies programmatically #26

Rich-Harris opened this issue Dec 9, 2014 · 10 comments

Comments

@Rich-Harris
Copy link
Contributor

Gobble build definitions are defined as a sequence of transformations form source to result, rather than (as in JavaScript bundlers like webpack/browserify etc) by taking some final entry point and recursively discovering its dependencies. Generally speaking that's a more appropriate model for the (more general purpose) job Gobble is trying to do.

There are some occasions when it would be useful for a transformer to be able to declare its dependencies (that are outside its inputdir) programmatically. For example, if you have a bunch of markdown documents that you want to turn into HTML pages, you probably have an HTML template to wrap around the content:

var fs = require( 'fs' ),
    marked = require( 'marked' ),
    gobble = require( 'gobble' );

var template = fs.readFileSync( 'template.html', 'utf-8' );
var html = gobble( 'markdown' ).transform( function ( markdown ) {
  return template.replace( '__CONTENT__', marked( markdown ) );
});

In this example the html node would contain the expected result, but if you changed the template.html file, Gobble wouldn't know that it needed to re-run the transformation.

A few quick API ideas:

// synchronous `gobble.read()` function returns file contents (caches the result
// in memory so it doesn't hit the filesystem each time, obvs) and marks it
// as a dependency
var html = gobble( 'markdown' ).transform( function ( markdown ) {
  return gobble.read( 'template.html' ).replace( '__CONTENT__', marked( markdown ) );
});

// `.dependsOn(filename)` method registers dependency and returns `this`. Allows
// explictly declaring dependencies in the build definition in an efficient way, but
// doesn't allow programmatic declaration. Also, no easy caching
var html = gobble( 'markdown' ).transform( function ( markdown ) {
  var template = fs.readFileSync( 'template.html', 'utf-8' );
  return template.replace( '__CONTENT__', marked( markdown ) );
}).dependsOn( 'template.html' );

// dependency is explicity declared inside the function, webpack style. Again,
// no help with caching
var html = gobble( 'markdown' ).transform( function ( markdown ) {
  this.addDependency( 'template.html' );
  var template = fs.readFileSync( 'template.html', 'utf-8' );
  return template.replace( '__CONTENT__', marked( markdown ) );
});

// file node has a `.contents()` method which returns the contents and
// registers the dependency
var template = gobble( 'template.html' ); // depends on issue #23
var html = gobble( 'markdown' ).transform( function ( markdown ) {
  return template.contents().replace( '__CONTENT__', marked( markdown ) );
});

(A synchronous API is preferable I think, since it means it can be used with both directory and file transformers.)

None of them leap out as being The Answer - would be interested if anyone out there has any feedback. Figuring this out would mean never again having to use static site generators (which in my experience compensate for their lack of intrinsic flexibility with overwhelmingly complex configuration).

/cc @OliverJAsh and @theefer, since this is related to the stuff we were talking about a while back. Have you encountered this problem with Plumber?

@evs-chris
Copy link
Contributor

I would like this. Of those options, I would lean toward the last one.

What about something added to the context of the transform?

var html = gobble( 'markdown', { files: [ 'template.html' ] } )
  .transform( function ( markdown ) {
    return this.files[ 'template.html' ].replace( '__CONTENT__', marked( markdown ) );
  });
// 'files' may be better as 'dependencies' or some other appropriate term

There's probably some extra benefit to number 4 above though, as the template.html could be transformed before it is used in the markdown transform.

@theefer
Copy link

theefer commented Jan 4, 2015

There is currently no operation that does this in Plumber, but it should indeed be possible, since all operations are chunks of a pipeline through which a new execution flows when files change. I've been wanting to do that for plumber-jshint (re-run if jshintrc changes) for instance.

For your example, it would probably look something like:

var template = glob('template.html');
pipelines[''] = [
  glob('markdown/**'),
  markdown(template),
  write('out');
];

(Or else markdown('template.html') where the markdown operation would be responsible for calling something like glob on its argument internally, but that bloats the plugin imo.)

Either way, I guess the point is that it's nice to be able to rely on the standard API for reading out a file (asynchronously) and being notified when it changes. Option 4 above seems the closest as you're just gobble()ing for the file and get a standard node out of it. The other options (esp 2 and 3) are more imperative and as I think you mention it's not clear how you'd do it programmatically (e.g. lookup jshintrc file in parent directories without using blocking fs calls).

Once you have both the template and the MD files as top-level objects (node, streams, whatever) that emit files initially and on change, you need a way to combine ("merge") them back together into a single node. I'm not familiar enough with Gobble, but it seems like you already have that ability (conceptually at least), albeit just writing all the input node contents into the same output dir.

I'm probably being stupid, but would something like this work technically?

var template = gobble( 'template.html' );
var files = gobble( 'markdown' );
var html = gobble([template, files]).transform( function ( markdownOrTemplate ) {
  if (markdownOrTemplate.filename !== 'template.html') {
    var template = fs.readFileSync( 'template.html', 'utf-8' );
    return template.replace( '__CONTENT__', marked( markdown ) );
  }
});

Now this is of course terrible and inefficient code, but I'm just trying to grasp the basic building blocks :-)

If this is not completely misled, it would seem that the issue is that the only way to merge nodes doesn't allow differentiating which node the files came from. Put otherwise, a transform can only really take a single node as input (source, merger or another transform node), but really here we want two.

So template.contents() in your option 4 seems to be the closest thing. Would you have a way to cause this code to register template as another dependency (or "input") to the transform node? I'm curious how that works. More importantly, it looks like you would have to make the contents() method return the contents synchronously; how would that work? Would you rely on having registered template as an input of the transform and only execute the transform body when both inputs are ready (so the data is available synchronously too)? And re-execute the body if any of the inputs fires?

@msegado
Copy link

msegado commented Apr 27, 2015

+1 on this. My current usecase is slightly different, though: I have a bunch of independent Sass files (one per "widget" in a library of widgets) which all import a common library. While a directory transformer would handle this, I'd prefer to use a file transformer to avoid recompiling all of the widget files when just one changes, and would then need a way for them to register the library as a dependency (as well as a way to feed the library to Sass, for which it would help if I could register an entire directory as a dependency).

@Rich-Harris
Copy link
Contributor Author

Yeah, me too, I keep circling back round to this. Occasionally I'll find a spare 45 minutes and start hacking around on it, before realising it's going to take a lot more than a spare 45 minutes...

@aubergene
Copy link

You can already return an object with a map property which has an array of sources, couldn't this be used? Then anything which already returns a sourceMap would automatically register its dependencies?

@aubergene
Copy link

I realise that sources doesn't necessarily contain all dependencies, but I still like the simplicity that you just return an array.

var html = gobble( 'markdown' ).transform( function ( markdown ) {
  var template = fs.readFileSync( 'template.html', 'utf-8' );
  return {
    code: template.contents().replace( '__CONTENT__', marked( markdown ) ),
    deps: ['template.html']
  } 
});

@msegado
Copy link

msegado commented Dec 7, 2015

Hmm, I do wonder about @Rich-Harris's comments on caching, though - any time we're doing the reading ourselves, we can't easily cache the file.

On the other hand, if we're trying to support external tools that are based around reading from disk instead of accepting arguments as strings (e.g., Webpack), we'd probably want Gobble to just register the dep without reading from the filesystem at all.

One fairly crazy approach would be to handle caching down at the filesystem level... something like what https://www.npmjs.com/package/cachedfs does with its patchInPlace functionality (though cachedfs is read-only at the moment and doesn't do cache invalidation). That way, it doesn't matter if plugins hit the disk multiple times. Seems more like a job for a standalone module though, rather than a build system like Gobble...

@msegado
Copy link

msegado commented Dec 7, 2015

(Actually, that's arguably not even a module's job, but rather the OS's job... which makes me wonder, how much does manual caching actually matter if the OS is caching in memory anyway? Is there really a significant overhead from the system calls that still makes in-JavaScript caching favorable compared to just calling fs.* and letting the OS deal with it?)

@aubergene
Copy link

I like Rich's general approach of Gobble optimising for developer sanity. A file transform has a list of watched files which triggers its build and it's up to the plugin how it performs the transform.

@msegado
Copy link

msegado commented Dec 8, 2015

Yeah, I definitely also like the approach... my ramblings were more about
how to handle caching of the additional dependencies (which gets harder if
those don't go through Gobble), and whether that gains us much performance
or not given OS-level filesystem caching. I just realized that I've been
confusing two types of "caching" here though: (1) filesystem IO caching,
which we probably don't need to worry about too much, and (2) caching of
intermediate transform results, which probably matters more (i.e., we want
to reuse the output of intermediate transformers whose inputs have not
changed to avoid unnecessary work).

This leads to another, hopefully more relevant thought: what if we want to
transform any of the external deps before using them? To use Rich's
markdown example from above: what if we wanted to do some find-and-replace
step on the template before using it in the markdown transformer? That
would favor the last of the four approaches Rich was suggesting earlier,
where we explicitly Gobble that file and then have the transformer
use/register it via something like node.contents().

I like Rich's general approach of Gobble optimising for developer sanity. A
file transform has a list of watched files which triggers its build and
it's up to the plugin how it performs the transform.


Reply to this email directly or view it on GitHub
#26 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants