Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Validate exports keys and values #359

Closed
jkrems opened this issue Jul 23, 2019 · 39 comments
Closed

Validate exports keys and values #359

jkrems opened this issue Jul 23, 2019 · 39 comments

Comments

@jkrems
Copy link
Contributor

jkrems commented Jul 23, 2019

We currently do not put any restrictions on keys and values beyond "whatever works in the current implementation". There should be a clear, non-ambiguous definition what happens for every possible key and value.

Worth noting that import maps explicitly treat the pattern and specifier values as arbitrary data. They are not restricted to valid filesystem or URL characters:

Screen Shot 2019-07-25 at 9 05 36 AM

@jkrems
Copy link
Contributor Author

jkrems commented Jul 24, 2019

Previous discussion: jkrems/proposal-pkg-exports#6

@jkrems
Copy link
Contributor Author

jkrems commented Jul 24, 2019

There's 3 different "data types" at work:

  • (Bare) specifier, maybe the concatenation of name and expansion.
  • The pattern (or key).
  • The target (or value).
  • The resolvedURL.

An approach that seems consistent with how import map validation works could be:

  1. optional Normalize specifier, e.g. by collapsing directory separators or converting \\ into /.
    This step is mostly relevant to CJS and Windows where we have to support certain existing specifier formats.
  2. Parse specifier into name and expansion. Validate name and expansion.
  3. Match expansion to a pattern, resulting in a target. Validate target.
  4. Apply the pattern, yielding a resolvedURL. Validate resolvedURL.

The nature of the validation (and what a failed validation means) is TBD here.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 24, 2019

Known possible target values:

  • string, relative URL starting with ./.
  • string, relative URL starting with ../. Current consensus: Disallowed. May throw?
  • string, relative URL starting with /. Current consensus: Disallowed. May throw?
  • string, absolute URL with a supported scheme.
  • string, absolute URL with an unsupported scheme.
  • array, each element a non-array and non-null value.
  • null, alias for [].
  • boolean. TBD.
  • number. TBD.
  • object. TBD.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

For pattern values, there are few restrictions. Unrecognized patterns would just not do anything helpful. In a way the definition of a useful pattern value is "anything that could be matched by a valid expansion". The only restriction is: target and pattern have to either both end in a slash ("/") or both not end in a slash ("/").

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

If we want to prevent easy loopholes, especially on Windows, we would have to do some limited normalization on specifier values before determining the name. We can protect the exports in the following cases, specifically when the specifier is coming from CJS:

  • non-exports-pkg/../exports-pkg/foo.js (path traversal loophole)
  • exports-pkg\foo.js (Windows loophole)
  • @scope\name/foo.js (Windows loophole, scope edition)

In ESM we'd reject those as invalid specifiers, so the issue doesn't exist. But in CJS we generally fall back on the old resolution when no exports are found.

@bmeck
Copy link
Member

bmeck commented Jul 25, 2019

Why is escaping the current package boundary a "loophole"?

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

It's a "loophole" as in "you're still using bare specifiers but you circumvent exports". But I scaled down the language to call out that preventing those circumventions is optional.

@bmeck
Copy link
Member

bmeck commented Jul 25, 2019

if it is a concern couldn't you negotiate whenever you cross into a package boundary? would be a lot of IO though.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

Yeah, it would also muddy the waters of what exports are. Because right now they don't really apply to "directories" but to "package names". Since this is only an issue in CJS (where we have to support old behavior), I'd rather keep it simple. E.g. "we accept that it's possible" or "we normalize and prevent these known patterns".

@bmeck
Copy link
Member

bmeck commented Jul 25, 2019

How does this not have the same problem in ESM?

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

I thought somebody (@guybedford?) mentioned that we'd throw a ERR_INVALID_MODULE_SPECIFIER for those kinds of specifiers but I can't find any mention of it now. :( So I guess we still have to define behavior for it, either accepting the same kinds of issues as with CJS or making those illegal in ESM (e.g. .. segments and backslashes).

EDIT: The only thing we do reject is backslashes in scopes (Invalid package name '@foo\bar').

@bmeck
Copy link
Member

bmeck commented Jul 25, 2019

@jkrems of note, | is also a character that seems to be in conflict with import maps/layered API fallback strings

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

@bmeck Thanks, I wasn't aware of that! Do you happen to have a link about the pipe / | character in import maps? Tried to search through the usual repos and couldn't find more details.

@bmeck
Copy link
Member

bmeck commented Jul 25, 2019

@jkrems the conflict actually comes from https://github.com/drufball/layered-apis#solution but the specifiers from layered APIs would be split on | prior to being import mapped

@jkrems
Copy link
Contributor Author

jkrems commented Jul 25, 2019

Ah, I remember now. My understanding was that because of backwards compat, the focus shifted to "import map mapping the polyfill URL to the built-in module" instead of a new URL format. I found this PR by @domenic that migrates the content from | to import maps without |.

@domenic Do you have a gut feeling if | will stick around at the end? Or did you move past that approach?

@guybedford
Copy link
Contributor

Import maps are entirely based on checking / and applying the URL parser. There are no other string validations currently, only type validations.

I think aligning with that probably makes the most sense, and to be honest, banning any type of backtracking seems out of scope to me.

@robpalme
Copy link
Contributor

I'd like to suggest banning back-tracking should be in scope. The whole point of bare-specifiers is to permit indirection into a target package . The importer doesn't need to know precisely where the target is. Jumping outside of the target package implies the caller now becomes dependent on the location of the package and what surrounding modules exist.

@domenic
Copy link

domenic commented Jul 29, 2019

@domenic Do you have a gut feeling if | will stick around at the end? Or did you move past that approach?

There are no current plans to work on import specifiers with |s in them. We should really turn down that repository.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 29, 2019

@domenic Thanks for confirming! :)

I think aligning with that probably makes the most sense, and to be honest, banning any type of backtracking seems out of scope to me.

I think I agree. We shouldn't make up restrictions that don't exist in import maps, in general. The only kind of backtracking I'd like to rule out is an explicit exports mapping of a directory prefix and a specifier that breaks out of the directory. E.g. exports-pkg/exported-dir/../random-file.js.

@ljharb
Copy link
Member

ljharb commented Jul 29, 2019

exports isn’t the same thing as import maps tho - import maps have no concept of a package, whereas exports are highly tied to the concept.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 29, 2019

exports isn’t the same thing as import maps tho - import maps have no concept of a package, whereas exports are highly tied to the concept.

I don't disagree but I'm not sure I follow the logic. To me not that's the reason why exports shouldn't care about directory traversal. They operate on package names, not on disk layout.

@GeoffreyBooth
Copy link
Member

I don’t see a use case for an export path referencing a file outside of the package. If anything that feels like a bug, potentially one with security concerns. If we can lock it down before "exports" gets unflagged I think that would be best. We can always open it back up if someone comes up with a use case and can prove that it is safe.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 29, 2019

@GeoffreyBooth There's multiple valid URLs that are outside of the package boundaries:

  1. It may point to a standard or built-in module.
  2. It may point to an absolute URL, e.g. a data URL or an https URL as we land support for those.

Also main already allows pointing to URLs / paths outside of the package. It feels inconsistent to disallow it for exports. Yes, we allow package authors to shoot themselves in the foot but realistically we can't prevent that. They could also set the mapping to ./foo.js and check in a symlink. There's no safety win here afaict.

@ljharb
Copy link
Member

ljharb commented Jul 29, 2019

There’s no guarantee we’d ever land support for an https module; a data url or a bare specifier is different, i think, than a filesystem path outside the package directory.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 29, 2019

I'm just not sure what we're protecting against here. The following are pretty much equivalent:

// map outside of boundary:
"exports": { "./foo.js": "../bar.js" }
// map outside of boundary, using intermediate file:
"exports": { "./foo.js": "./bar-proxy.js" }

// bar-proxy.js:
export * from '../bar.js';

Disallowing the former feels like security theater (creates the impression of added security without changing the actual properties).

@ljharb
Copy link
Member

ljharb commented Jul 29, 2019

It’s not about security to me; it’s about the likelihood of bugs. Requiring the intermediate file means someone has to really intend to make their package non-portable.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 29, 2019

Right, but how likely is it really that somebody accidentally uses ../bar.js in an export? It's allowed in main and other current fields and I've never heard of a bug caused by it.

@guybedford
Copy link
Contributor

There seem to be three main phases we can apply validations:

  1. Validate the exports target before applying it, eg to ensure it has no /../ segments, or percent-encoded /../ segments (which only get discovered moving from URL -> path, and wouldn't necessarily be caught at the URL restriction phase!
  2. Validate the exports subpath the user themselves provide - similar to the validation above, but instead applied to the user-supplied subpath after the pkg/... part.
  3. Validate the final resolved URL to ensure it is within the expected package boundary. The issue here is that it might be difficult to debug, as we don't necessarily know whether it was the user-supplied part, or the "exports" part that resulted in the backtracking - maybe we can carefully design the error message to handle both. In addition, percent-encoded separators would still slip through to the file system here unless we explicitly have a phase to detect that too.

Since we know this is not a secure system - I'm tempted to say the simplest approach should win... instead of trying to comprehensively tackle every scenario.

The absolute simplest seems to me just to do nothing and have URL resolution, as @jkrems says no one has complained about this for "main".

But if we must introduce validation, please lets find a simple model for it with good errors, so we maintain a good developer experience.

@robpalme
Copy link
Contributor

As others have said, we're not aiming for a security boundary here. This is about preventing bugs and conveying intent (that this feature is to be used to deliver portable package encapsulation).

For @guybedford 's list, (1) is the most valuable validation because there is no known use-case for losing portability.

(2) is also valuable - not least because it would prevent any user confusion over whether back-tracking in a sub-path is resolved before or after pattern-matching. If a pkg is using exports, it means that importers ought to entering via well-defined sub-paths.

@bmeck
Copy link
Member

bmeck commented Jul 30, 2019

If we do apply these validation rules, we should expect to be able to relax them for things if we find use cases around things outside of the validation rules. This could also be linted ahead of time it seems and we don't apply the same restrictions to things like fs. Is the frequency of these specifiers resolving outside of the package common / in error? Or is there some use case?

I would definitely prefer we spend time researching things here rather than banning outright to begin with.

Per (1), the path could still be escaped using symlinks or the like. I don't think .. is a reliable means to know if something escapes the path boundary. I do think since (2) is a request it could be applied after normalizing, but think that .. should be normalized before validation.

Overall, I think adding these restrictions adds a lot of complexity for validations that I don't see users being able to reason about well on their own. I would prefer we leave this to linters given the complexity of understanding the restrictions, lack of data, and potential need to remove the errors in the future as use cases like loading different protocols is exposed.

I think after reading these comments, I am +1 to some naive validation (properly formatted strings), but -1 to complex package boundary enforcement.

@ljharb
Copy link
Member

ljharb commented Jul 30, 2019

fs isn’t package-related; the lack of these validations in main/bin has caused tons of bugs in the ecosystem over the years - it would be a shame to miss the opportunity to maximally validate. Anything correctness-related (as opposed to style-related) that the language or platform leaves to linters is imo also a failure, as it passes burden to a large number of downstream humans.

I think normalizing before validating is the proper approach; as long as the error message indicates which exports mapping is causing the problem it’ll be very useful.

@bmeck
Copy link
Member

bmeck commented Jul 30, 2019

@ljharb I'm unsure that these restrictions are correctness. What examples do you have of this in the ecosystem? Using linters alleviates the usefulness of doing things across package boundaries and runtime costs in both CPU and complexity. These fields are statically analyzable and do not have the complexity of JS linting even. I am just unsure of why we are applying these restrictions without clear reasoning. It seems an effort to contain packages within themselves, but packages can cross boundaries in all sorts of ways, not just to outer paths, but to inner packages as well. The validation seems arbitrary without evidence of common mistakes and reasons for people using code outside of their own package.

@ljharb
Copy link
Member

ljharb commented Jul 30, 2019

I have no concrete examples available; but my anecdotal data is that these sort of bugs do happen - in main occasionally, but more often in the browser field which is the closest analogue to exports.

We can always relax restrictions later; we can never tighten them. Nows the time to start with them because we have no other opportunity.

That there are other ways to cause problems doesn’t mean we have to let this abstraction be a leaky sieve too.

@bmeck
Copy link
Member

bmeck commented Jul 30, 2019

It seems fine if we are doing this purely precautionary but we should probably try to lay out the goals a bit clearer.

Are we trying to prevent any package boundary crossing ?

  • Inner packages can be access via things like ./node_modules/dep
  • Outer packages can be access via things like ../peer
    • peerDependencies ?
  • Vendored packages with a package.json
    • bundledDependencies ?
  • Absolute specifiers?
    • If they resolve inside the current package ?
    • These are not altered across installations ? Is this really about problems with installation graphs or about package boundaries in general ?

At what points are validation occuring:

  • Specifiers prior to resolution import("foo/../bar") "foo/../bar"
  • While doing path traversal?
    • If pkg's main exports ../peer but ../peer redirects to pkg/inner what happens since we still end up in the current package ? Same for peer instead of ../peer
      • Can these cause infinite loops ?
    • Symlinks behavior around --preserve-symlinks being on or off ?

Is this all done eagerly when searching for package.json or lazily upon importing a specific exports key? For most things I think lazy is good as it prevents slow startup, especially if it can be analyzed ahead of time.

If we can reach consensus on points like those above it seems fine since we can design the validation around the goals rather than ad-hoc.

@bmeck
Copy link
Member

bmeck commented Jul 30, 2019

It does seem that some frameworks at least are using ../ to escape their package boundaries on purpose such as polymer@3.

  • import the Polymer library from ./node_modules/@polymer/polymer/polymer-element.js in apps, and from ../@polymer/polymer/polymer-element.js in reusable elements. Now, apps and reusable elements can both import from @polymer/polymer/polymer-element.js.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 30, 2019

Just because I had to do a double-take in shock: It looks like polymer did stop asking people to do that in v3 but yes, the implication is that before then people did routinely reach out to files outside of package boundaries.

@bmeck
Copy link
Member

bmeck commented Jul 30, 2019

To be clear, I'm fine banning w/e behavior we can reach consensus on even if there are some uses of it; however, we should do some research on usages.

@guybedford
Copy link
Contributor

Shall we add this as an agenda item to the meeting then?

@jkrems
Copy link
Contributor Author

jkrems commented Aug 13, 2019

A set of validations has now landed in the implementation via nodejs/node#28949.

@jkrems jkrems closed this as completed Aug 13, 2019
@MylesBorins MylesBorins removed the modules-agenda To be discussed in a meeting label Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants