Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded license file ingestion for Gallery #6580

Merged
merged 57 commits into from
Nov 8, 2018
Merged

Embedded license file ingestion for Gallery #6580

merged 57 commits into from
Nov 8, 2018

Conversation

agr
Copy link
Contributor

@agr agr commented Oct 18, 2018

Addresses #6545, #6529, #6516, #6532

New column was added to GalleryDB to contain the type of the embedded license file (plain text or markdown). Storage is implemented for the case when no async validation pipeline is present.

I ended up diverging from the spec which prescribed storing a presence of an embedded file as a bool flag because this way for the local storage case we lose the document type information. So an enum was introduced that persists that information in DB.

Warning when package does not contain <license> but specifies <licenseUrl>:
image

Warning when package does not contain any license information:
image

License URL rejection message (when enabled):
image

License file rejection message:
image

Warning when license URL is not aligned with license expression:
image

@joelverhagen joelverhagen dismissed their stale review November 5, 2018 20:33

I haven't seen the latest revision.

@@ -106,6 +106,8 @@ public static class GalleryConstants

public const string GitRepository = "git";

public const string LicenseDeprecationUrl = "https://aka.ms/deprecateLicenseUrl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://aka.ms/deprecateLicenseUrl [](start = 53, length = 34)

FYI, this link redirects to nuget's documentation home page. Is that intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remember to update it to something useful when the doc becomes available.

}

return null;
}

private static async Task<bool> IsStreamLengthMatchesReportedAsync(Stream licenseFileStream, long reportedLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsStreamLengthMatchesReportedAsync [](start = 40, length = 34)

This could be a nice Stream extension method. Maybe you could make a Stream extension class that also has LooksLikeUtf8TextStreamAsync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like to have a Stream extension methods that read from it. It should be very obvious that this operation consumes Stream (which is important for streams that are not seekable). I don't think extensions methods make it obvious.

throw new InvalidOperationException($"Unsupported license metadata type: {licenseMetadata.Type}");
}

private static bool HasChildElements(XElement xElement)
Copy link
Contributor

@loic-sharma loic-sharma Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasChildElements [](start = 28, length = 16)

What do you think of moving HasChildElements, GetLicenseVersion and GetLicenseType into LicenseCheckingNuspecReader?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO LicenseCheckingNuspecReader should do all XML-based operations itself and completely abstract away the fact the XML-shape of the nuspec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically moved them out not to call LicenseCheckingNuspecReader.LicenseElement repeatedly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, you could move all the functions into LicenseCheckingNuspecReader such that you can make LicenseElement private.

private class LicenseCheckingNuspecReader : NuspecReader
{
    private XElement _licenseElement => MetadataNode.Element(MetadataNode.Name.Namespace + LicenseNodeName);

    public string GetLicenseVersion() => _licenseElement.GetOptionalAttributeValue("version");

    /* other functions here */
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is the cost of calling MetadataNode.Element(MetadataNode.Name.Namespace + LicenseNodeName) multiple times and I don't want to care about it: I call it once and reuse the result as much as possible. If I make them members of LicenseCheckingNuspecReader then I'd have to make that Element call in each method or do something atrocious to keep the value between calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't want LicenseCheckingNuspecReader to exist, since it's only purpose is to get access to one protected member of a base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that and rolled back the change. It looks ugly.

It makes LicenseCheckingNuspecReader to provide methods that check the content of the license element in various ways (a methods to check if it exists, a method to check its length, a method to check if there are child elements, etc.). Class methods cannot assume licenseElement is not null, so this check creeps into every method whereas in the original code its a single check.

Copy link
Contributor

@scottbommarito scottbommarito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO you should split a lot of this code into separate files. Otherwise looks great.

src/NuGetGallery.Core/Services/CorePackageFileService.cs Outdated Show resolved Hide resolved
src/NuGetGallery.Core/Services/CorePackageFileService.cs Outdated Show resolved Hide resolved
@@ -106,6 +106,8 @@ public static class GalleryConstants

public const string GitRepository = "git";

public const string LicenseDeprecationUrl = "https://aka.ms/deprecateLicenseUrl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remember to update it to something useful when the doc becomes available.

src/NuGetGallery/Helpers/TextHelper.cs Outdated Show resolved Hide resolved
src/NuGetGallery/Services/PackageUploadService.cs Outdated Show resolved Hide resolved
throw new InvalidOperationException($"Unsupported license metadata type: {licenseMetadata.Type}");
}

private static bool HasChildElements(XElement xElement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO LicenseCheckingNuspecReader should do all XML-based operations itself and completely abstract away the fact the XML-shape of the nuspec.

=> licenseElement
.GetOptionalAttributeValue("type");

private class LicenseCheckingNuspecReader : NuspecReader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to its own file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It is only useful in the context of validation, I don't want to make it visible to all the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is huge. Anything you can do to split it up into more logical parts would improve the readability of it.

Copy link
Contributor Author

@agr agr Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is under 1000 lines. I've see methods longer than that :)

On a more serious note: moving it into separate file and making it public as a result would require to add a lot of code to be more defensive about how its methods are called. That will bloat its code which will not help the readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created work item to refactor the PackageUploadService, we'll address these issues when we'd be working on it.

{
// if the package is immediately made available, it means there is a high chance we don't have
// validation pipeline that would normally store the license file, so we'll do it ourselves here.
await SavePackageLicenseFile(packageFile, licenseStream => _packageFileService.SaveLicenseFileAsync(package, licenseStream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of making the method static? IMO the lambda here just makes the code more confusing--you're never going to be making this function do anything else other than save the file, so you've created a method named SavePackageLicenseFile that needs to be told how to save the package license file. You should just pass in the Package and call SaveLicenseFileAsync directly.

One other option: if the stream returned by packageArchiveReader.GetEntry is still valid after packageArchiveReader is disposed, you could rename SavePackageLicenseFile to GetPackageLicenseFile and then return the stream so you don't have to pass the Package to the function.

@agr agr merged commit 95cbe09 into dev Nov 8, 2018
@agr agr deleted the agr-license-ingest branch November 8, 2018 01:53
scottbommarito pushed a commit that referenced this pull request Nov 9, 2018
Enabling accepting license metadata. License files are not yet allowed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants