Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check-Licenses script can miss some incompatible dependencies #38461

Closed
mchowning opened this issue Feb 2, 2022 · 6 comments
Closed

Check-Licenses script can miss some incompatible dependencies #38461

mchowning opened this issue Feb 2, 2022 · 6 comments
Labels
Good First Issue An issue that's suitable for someone looking to contribute for the first time Needs Dev Ready for, and needs developer efforts [Package] Scripts /packages/scripts [Type] Bug An existing feature does not function as intended

Comments

@mchowning
Copy link
Contributor

mchowning commented Feb 2, 2022

Description

The check-licenses script has a couple of issues that failed to identify the incompatible license of an Apache-2.0 dependency I recently tried adding.

Specifically, I tried to add Shaka-Player version 2.5.23 to the project. The check-licenses script failed due to one of Shaka-Player's dependencies (eme-encryption-scheme-polyfill) having an Apache-2.0 license, but as I dug into this, I realized that Shaka-Player itself also had an Apache-2.0 license, but it wasn't flagged by the script.

I think there are two, somewhat separate issues that allowed Shaka-Player to slip through.

1. If the package.json file specifies an incompatible license, we proceed to look for a compatible license in the license files.

In this case, Shaka-Player's package.json file identified it as having an Apache-2.0 license. Instead of this causing Shaka-Player to be identified as incompatible, the script continued searching for compatible licenses.

2. When searching other license files for a compatible license, we only check if there is a compatible license anywhere in the file

Shaka-Player's license file clearly identifies it as having an Apache-2.0 license. The script, however, checks whether there is any compatible license anywhere in the file, and the license file does have a section at the end identifying that one of its dependencies uses the MIT license:

Contains code from https://github.com/mozilla/language-mapping-list

The MIT License (MIT)

Copyright (c) 2013 Ali Al Dallal

[TEXT OF MIT LICENSE]

The license-check script detects this, correctly identifies that MIT is a compatible license, and concludes that Shaka-Player is ok to include.

Is this a problem for our current dependencies?

I don't think so.

I went through and checked for any dependencies with incompatible licenses in their package.json file, and it doesn't look we have any dependencies where we are detecting a compatible license in the license file even though the license identified in the package.json file is incompatible (scenario 1 above).

I also did a check and confirmed we have no dependencies currently where the license file contains an Apache license reference but we're detecting a different license (scenario 2 above). That's certainly not a full-proof check (I only checked for Apache), but it's a pretty good indication that this isn't a major issue.

In light of that, I'm inclined to think of this as a bit of an edge case and probably not a high priority issue.

Step-by-step reproduction instructions

  1. Add shaka-player version 2.5.23 as a dependency.
  2. Run npm install
  3. Run npm run check-licenses
  4. Observe that although one of shaka-player's dependencies are identified as having incompatible licenses (eme-encryption-scheme-polyfill), the shaka-player dependency itself is not identified.
@mchowning mchowning added the [Package] Scripts /packages/scripts label Feb 2, 2022
@mchowning mchowning changed the title Check-Licenses script can some incompatible dependencies Check-Licenses script can miss some incompatible dependencies Feb 2, 2022
@gziolo gziolo added the [Type] Bug An existing feature does not function as intended label May 7, 2022
@gziolo gziolo added Good First Issue An issue that's suitable for someone looking to contribute for the first time Needs Dev Ready for, and needs developer efforts labels May 7, 2022
@bdurette
Copy link
Contributor

Digging into the relevant code a bit, first up, there is (rightfully) a stern warning in the file:

/*
 * WARNING: Changes to this file may inadvertently cause us to distribute code that
 * is not GPL2 compatible.
 *
 * When adding a new license (for example, when a new package has a variation of the
 * various license strings), please ensure that changes to this file are explicitly
 * reviewed and approved.
 */

Would love to understand the approval process for this specific file.

I think some goals for a fix, in priority order, should be:

  1. No more permissive than the current version.
  2. Minimize false-positives--especially against current dependencies--if they would cause builds/releases to fail.
  3. Only allow for inclusion of properly licensed code.

What does a license file mean if it has more than one license in it? I don't know of a standard answer to this. There's usually descriptive text explaining why there are multiple licenses. Sometimes the code is dual licensed (i.e., "You may use this software under any of the following..."). Sometimes portions of the code are licensed with a compatible license (Apache licensed, with embedded MIT code), which is the case with the code at issue here. Sometimes specific submodules have different licenses. This may be intractable. Some investigation is required to understand if that's an actual issue.

With that in mind, a proposal:

If treat all licenses found in a license file as a conjunctive list ("AND"), then any license in the file which is incompatible would result in a rejection. Provided there are no implementation bugs, this would satisfy goals 1 and 3, but potentially fail with 2. This can be tested post-implementation.

Any implementation bugs have the potential to fail to meet any of the goals, hence the need for strong review, tests, etc.

bdurette pushed a commit to bdurette/gutenberg that referenced this issue Dec 27, 2022
Conjuctive licenses were being ignored in both the `package.json` and within
various LICENSE files. In the first case, this could lead to false negatives
$(e.g., 'MIT AND BSD' being treated as non-compatible). In the second case, the
implementation was such that only one license was returned (whichever detected
license occurred later in `licenseFileStrings`). Based on the ordering of that
list, this was likely to cause a false positive, because the non-compatible
'Apache-2.0' license occurs before any of the compatible licenses.

Progress on WordPress#38461.
@bdurette
Copy link
Contributor

bdurette commented Dec 27, 2022

Some observations, having done the above implementation:

  1. The checkLicense function is probably inadequate, in that it treats known incompatibilities (e.g., 'Apache-2.0') the same as values that it cannot interpret. In both cases, the caller continues to search for other compatible licenses. Perhaps that should return some tri-state value, in which case the value in package.json would be taken directly for the Shaka-Player dependency.
  2. There is a library for handling SPDX expressions (spdx-satisfies) that, if trusted, would allow for the handling of more complex SPDX expressions.

I'm open to reworking the code to use that library, if the relevant approvers are sufficiently trusting of that library.

@gziolo
Copy link
Member

gziolo commented Feb 13, 2023

@mchowning, can you confirm whether #46801 resolves the issue? It definitely improves a few things and adds unit tests for the revisited implementation.

@mchowning
Copy link
Contributor Author

can you confirm whether #46801 resolves the issue?

Looks like it does to me. Certainly puts us in a much better position (especially those tests ❤️ ). Thanks @bdurette !

gziolo pushed a commit that referenced this issue Apr 17, 2023
* Add support in `check-license` for conjunctive (AND) licenses.

Conjuctive licenses were being ignored in both the `package.json` and within
various LICENSE files. In the first case, this could lead to false negatives
$(e.g., 'MIT AND BSD' being treated as non-compatible). In the second case, the
implementation was such that only one license was returned (whichever detected
license occurred later in `licenseFileStrings`). Based on the ordering of that
list, this was likely to cause a false positive, because the non-compatible
'Apache-2.0' license occurs before any of the compatible licenses.

Progress on #38461.

* Add unit tests for checkAllCompatible

* Update CHANGELOG

* Tidy up docblock.
@gziolo
Copy link
Member

gziolo commented Apr 17, 2023

I have finally landed #46801. I will close the issue based on the previous comment from @mchowning. Feel free to reopen if you discover that something is still missing.

@gziolo gziolo closed this as completed Apr 17, 2023
@gziolo
Copy link
Member

gziolo commented Apr 17, 2023

Observe that although one of shaka-player's dependencies are identified as having incompatible licenses (eme-encryption-scheme-polyfill), the shaka-player dependency itself is not identified.

By the way, this is what I see with the latest version from trunk:

Screenshot 2023-04-17 at 13 25 13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue An issue that's suitable for someone looking to contribute for the first time Needs Dev Ready for, and needs developer efforts [Package] Scripts /packages/scripts [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

No branches or pull requests

3 participants