Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for scanning Java packages (#1463) #1476

Merged
merged 30 commits into from
Jan 6, 2022

Conversation

anthonyharrison
Copy link
Contributor

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2021

Codecov Report

Merging #1476 (74e251a) into main (82c6d6b) will decrease coverage by 0.74%.
The diff coverage is 7.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1476      +/-   ##
==========================================
- Coverage   82.32%   81.57%   -0.75%     
==========================================
  Files         279      279              
  Lines        5454     5509      +55     
  Branches      884      900      +16     
==========================================
+ Hits         4490     4494       +4     
- Misses        774      824      +50     
- Partials      190      191       +1     
Flag Coverage Δ
longtests 81.57% <7.27%> (-0.75%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cve_bin_tool/version_scanner.py 56.43% <7.27%> (-18.40%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82c6d6b...74e251a. Read the comment docs.

@terriko
Copy link
Contributor

terriko commented Dec 29, 2021

I swear, holiday time is the worst time for CI infrastructure bugs. Looks like the CI run had some network issue, I'm going to try running it again.

Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just one question about whether we can reasonably do some additional validation on the xml.


def run_java_checker(self, filename, lines):
"""Process maven pom.xml file and extract product and dependency details"""
tree = ET.parse(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does maven have any sort of xmlschema we should be using to verify that the data is valid? I feel like it should have a published one we could use (and possibly store locally in our source tree to avoid the network connection), but I won't be surprised if that's not the case.

@anthonyharrison
Copy link
Contributor Author

anthonyharrison commented Dec 29, 2021 via email

@terriko
Copy link
Contributor

terriko commented Jan 5, 2022

It looks like there's an xmlschema package similar to the jsonschema one we already use: https://xmlschema.readthedocs.io/en/latest/usage.html

It looks like it's also an option in lxml, but I believe bandit will raise issues if we use lxml directly. I don't see anything when I search for "schema" in defusedxml.ElementTree so I'm guessing it doesn't have a similar function though you'd think that'd be a nice addition.

If we can validate I feel like it's good practice to do so, although I do wonder if we'll find that the XML is frequently non-compliant in practice. Only one way to find out, though!

@anthonyharrison
Copy link
Contributor Author

anthonyharrison commented Jan 5, 2022 via email

@terriko
Copy link
Contributor

terriko commented Jan 5, 2022

BTW, the other thing our xml security folk recommend is forcing the use of an encoding. I can't find anything specifically designed to do this in the xml libraries, but we probably could force one with python's usual .encode() if we wanted to be explicit but I haven't looked into whether that's needed or it's already being implicitly done under the hood.

@anthonyharrison
Copy link
Contributor Author

anthonyharrison commented Jan 5, 2022 via email

@terriko
Copy link
Contributor

terriko commented Jan 6, 2022

I don't think any of these schemas change super rapidly the way the nvd data does. If that's true, and also they're not huge and there's no licensing issues (I'm not even sure if you can license a schema or if that's considered like an API... might have to do some research) then we'd probably save a lot of time for users if we store them in the repo somewhere have an update script that could be run by the user but mostly is used by Github Actions to do a regular auto-update.

@anthonyharrison
Copy link
Contributor Author

anthonyharrison commented Jan 6, 2022 via email

@terriko
Copy link
Contributor

terriko commented Jan 6, 2022

And this is all starting to sound like a whole separate feature. Given that it looks like defusedxml will fail correctly (but not elegantly) on malformed data I don't think there's a huge security risk if we put the schema validation part into a separate PR if you want to have me merge this one and iterate while I figure out the licensing implications. Let me know!

@anthonyharrison
Copy link
Contributor Author

anthonyharrison commented Jan 6, 2022 via email

Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's get this one merged and I'll open up a separate issue about improving xml schema validation.

@terriko terriko merged commit 28f9dad into intel:main Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants