feat: Add support for scanning Java packages (#1463) #1476

anthonyharrison · 2021-12-23T20:20:20Z

No description provided.

Resync repository

codecov-commenter · 2021-12-23T20:25:21Z

Codecov Report

Merging #1476 (74e251a) into main (82c6d6b) will decrease coverage by 0.74%.
The diff coverage is 7.27%.

@@            Coverage Diff             @@
##             main    #1476      +/-   ##
==========================================
- Coverage   82.32%   81.57%   -0.75%     
==========================================
  Files         279      279              
  Lines        5454     5509      +55     
  Branches      884      900      +16     
==========================================
+ Hits         4490     4494       +4     
- Misses        774      824      +50     
- Partials      190      191       +1

Flag	Coverage Δ
longtests	`81.57% <7.27%> (-0.75%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
cve_bin_tool/version_scanner.py	`56.43% <7.27%> (-18.40%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82c6d6b...74e251a. Read the comment docs.

terriko · 2021-12-29T20:38:54Z

I swear, holiday time is the worst time for CI infrastructure bugs. Looks like the CI run had some network issue, I'm going to try running it again.

terriko

This looks good to me, just one question about whether we can reasonably do some additional validation on the xml.

terriko · 2021-12-29T21:11:24Z

cve_bin_tool/version_scanner.py

+
+    def run_java_checker(self, filename, lines):
+        """Process maven pom.xml file and extract product and dependency details"""
+        tree = ET.parse(filename)


Does maven have any sort of xmlschema we should be using to verify that the data is valid? I feel like it should have a published one we could use (and possibly store locally in our source tree to avoid the network connection), but I won't be surprised if that's not the case.

anthonyharrison · 2021-12-29T21:27:04Z

Terri There is a schema https://maven.apache.org/xsd/maven-4.0.0.xsd . Is there something we need to add to ensure that the XML document is valid before we start processing it? May have to do something similar for the SBOM XML files.... <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

…

On Wed, 29 Dec 2021 at 21:12, Terri Oda ***@***.***> wrote: ***@***.**** requested changes on this pull request. This looks good to me, just one question about whether we can reasonably do some additional validation on the xml. ------------------------------ In cve_bin_tool/version_scanner.py <#1476 (comment)>: > + self.logger.debug(f"Try alternative product {product}") + # Remove parent appendage + if "-parent" in product: + product = product.replace("-parent", "") + product = product.replace("-", "_") + vendor_package_pair = self.cve_db.get_vendor_product_pairs(product) + if vendor_package_pair != []: + vendor = vendor_package_pair[0]["vendor"] + file_path = "".join(self.file_stack) + self.logger.debug(f"{file_path} {product} {version} by {vendor}") + return ProductInfo(vendor, product, version), file_path + return None, None + + def run_java_checker(self, filename, lines): + """Process maven pom.xml file and extract product and dependency details""" + tree = ET.parse(filename) Does maven have any sort of xmlschema we should be using to verify that the data is valid? I feel like it *should* have a published one we could use (and possibly store locally in our source tree to avoid the network connection), but I won't be surprised if that's not the case. — Reply to this email directly, view it on GitHub <#1476 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAID2YMRM2SFOSN52ZOJJTUTN2TRANCNFSM5KVOARRQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

terriko · 2022-01-05T01:15:32Z

It looks like there's an xmlschema package similar to the jsonschema one we already use: https://xmlschema.readthedocs.io/en/latest/usage.html

It looks like it's also an option in lxml, but I believe bandit will raise issues if we use lxml directly. I don't see anything when I search for "schema" in defusedxml.ElementTree so I'm guessing it doesn't have a similar function though you'd think that'd be a nice addition.

If we can validate I feel like it's good practice to do so, although I do wonder if we'll find that the XML is frequently non-compliant in practice. Only one way to find out, though!

anthonyharrison · 2022-01-05T15:00:21Z

Terri I will have a look. I agree that defusedxml doesn't have an equivalent function although lxml is used 'under the hood' by defusedxml. Will try and write a small test script to see what happens when I validate using the XML schema component and also see what Bandit says! Regards Anthony

…

On Wed, 5 Jan 2022, 01:15 Terri Oda, ***@***.***> wrote: It looks like there's an xmlschema package similar to the jsonschema one we already use: https://xmlschema.readthedocs.io/en/latest/usage.html It looks like it's also an option in lxml, but I believe bandit will raise issues if we use lxml directly. I don't see anything when I search for "schema" in defusedxml.ElementTree so I'm guessing it doesn't have a similar function though you'd think that'd be a nice addition. If we can validate I feel like it's good practice to do so, although I do wonder if we'll find that the XML is frequently non-compliant in practice. Only one way to find out, though! — Reply to this email directly, view it on GitHub <#1476 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAID2YZWXKS6BQKVTKHFGDUUOLT5ANCNFSM5KVOARRQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

terriko · 2022-01-05T18:44:28Z

BTW, the other thing our xml security folk recommend is forcing the use of an encoding. I can't find anything specifically designed to do this in the xml libraries, but we probably could force one with python's usual .encode() if we wanted to be explicit but I haven't looked into whether that's needed or it's already being implicitly done under the hood.

anthonyharrison · 2022-01-05T22:13:57Z

Terri Done a bit of experimenting and I think I may have a solution. The following code seems to do the job import xmlschema def validate_xml (filename, xsd_file): theschema = xmlschema.XMLSchema(xsd_file) try: result = theschema.validate(filename) except Exception: result = "Fail" return result is None filename="cyclonedx_test.xml" xsd_file="https://cyclonedx.org/schema/bom/1.3" if validate_xml(filename, xsd_file): print (f"Let's process {filename}") else: print (f"Oh dear! {filename} is not a valid XML file") And when I run bandit, I don't get any issues. A mal-formed XML file (e.g. a typo in one of the tags) will result in validation failure. I also noticed that the parse function of defusedxml will raise an exception when the parse function is called if the file is of an invalid format so we were probably already covered, albeit in an untidy way. Going forward, I think I need to add the XML validate function to the utils package and add something to the test suite (SBOM). However, we also need to work out what we do with the schemas. I assume that we should have local copies of the schemas so that we can do the validation when in offline mode. However, do you think it would be a good idea to grab copies of the schemas from the appropriate repos as part of the install process and then store them in a local area rather than having them locally within our repo?. Where do you think they should be stored? In the .cache directory (e.g. a subdirectory of .cache/cve-bin-tool/schemas) or as part of the Python installation? A bit more work for me to do but probably not for the next few days as I am travelling on business and I have to tidy up the FOSDEM slides before the weekend. Regarding the encoding suggestion, is this what is required? replace tree = ET.parse(filename) with xmlparser=ET.XMLParser(encoding="utf-8") tree = ET.parse(filename, parser=xmlparser) Regards Anthony

…

On Wed, 5 Jan 2022 at 18:44, Terri Oda ***@***.***> wrote: BTW, the other thing our xml security folk recommend is forcing the use of an encoding. I can't find anything specifically designed to do this in the xml libraries, but we probably could force one with python's usual .encode() if we wanted to be explicit but I haven't looked into whether that's needed or it's already being implicitly done under the hood. — Reply to this email directly, view it on GitHub <#1476 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAID25HRUALFX46UWMNCWDUUSGRNANCNFSM5KVOARRQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

terriko · 2022-01-06T18:33:09Z

I don't think any of these schemas change super rapidly the way the nvd data does. If that's true, and also they're not huge and there's no licensing issues (I'm not even sure if you can license a schema or if that's considered like an API... might have to do some research) then we'd probably save a lot of time for users if we store them in the repo somewhere have an update script that could be run by the user but mostly is used by Github Actions to do a regular auto-update.

anthonyharrison · 2022-01-06T18:39:13Z

Hi Terri I will have a look at assembling a set of schemas. Think I need a schema for Spdx and cyclonedx SBOMs and one for the maven pom file. I assume that we will need to update the setup script to copy the files from the repo Regards Anthony

…

On Thu, 6 Jan 2022, 18:33 Terri Oda, ***@***.***> wrote: I don't think any of these schemas change super rapidly the way the nvd data does. If that's true, and also they're not huge and there's no licensing issues (I'm not even sure if you can license a schema or if that's considered like an API... might have to do some research) then we'd probably save a lot of time for users if we store them in the repo somewhere have an update script that could be run by the user but mostly is used by Github Actions to do a regular auto-update. — Reply to this email directly, view it on GitHub <#1476 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAID2ZCYHOD4JQ2NYUT4P3UUXN7DANCNFSM5KVOARRQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

terriko · 2022-01-06T18:40:46Z

And this is all starting to sound like a whole separate feature. Given that it looks like defusedxml will fail correctly (but not elegantly) on malformed data I don't think there's a huge security risk if we put the schema validation part into a separate PR if you want to have me merge this one and iterate while I figure out the licensing implications. Let me know!

anthonyharrison · 2022-01-06T18:43:12Z

Terri I agree that the validation of XML is a separate feature as it is not just related to the java scanner. Anthony

…

On Thu, 6 Jan 2022, 18:40 Terri Oda, ***@***.***> wrote: And this is all starting to sound like a whole separate feature. Given that it looks like defusedxml will fail correctly (but not elegantly) on malformed data I don't think there's a huge security risk if we put the schema validation part into a separate PR if you want to have me merge this one and iterate while I figure out the licensing implications. Let me know! — Reply to this email directly, view it on GitHub <#1476 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAID27M2MIH6JXVXATKH33UUXO3RANCNFSM5KVOARRQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

terriko

Okay, let's get this one merged and I'll open up a separate issue about improving xml schema validation.

anthonyharrison and others added 27 commits May 27, 2020 12:55

Merge pull request #1 from intel/master

771960a

Resync repository

Merge remote-tracking branch 'upstream/master'

20d6262

Merge remote-tracking branch 'upstream/master'

5ab2c87

Merge remote-tracking branch 'upstream/master'

ea57d05

Merge remote-tracking branch 'upstream/master'

c2a8d1c

Merge remote-tracking branch 'upstream/master'

7035ae9

Merge remote-tracking branch 'upstream/master'

aaba9eb

Merge remote-tracking branch 'upstream/master'

3b91b0e

Merge remote-tracking branch 'upstream/master'

13fa1a8

Merge remote-tracking branch 'upstream/master'

5db21eb

Merge branch 'main'

9418560

Merge branch 'main'

0f3f754

Merge remote-tracking branch 'refs/remotes/upstream/main'

9e04dab

Merge branch 'intel:main' into master

da7e34a

Merge branch 'intel:main' into master

8c9ea48

Merge branch 'intel:main' into master

9581317

Merge branch 'intel:main' into master

c59de25

Merge branch 'intel:main' into master

bf5908a

Merge branch 'intel:main' into master

2505e10

Merge branch 'intel:main' into master

589b93d

Merge branch 'intel:main' into master

42757cb

Merge branch 'intel:main' into master

0f519a3

Merge branch 'intel:main' into master

4197fde

Merge branch 'intel:main' into master

6a9494b

Merge branch 'intel:main' into master

187f669

feat: Add support for scanning Java packages (intel#1463)

50ae124

feat: Add support for scanning Java packages (intel#1463)

237b05a

feat: Add support for scanning Java packages (intel#1463)

7135ad1

anthonyharrison mentioned this pull request Dec 24, 2021

log4J Spring and Angular are checked ? #1472

Closed

anthonyharrison added 2 commits December 26, 2021 14:40

feat: Add support for scanning Java packages (intel#1463)

9e3d9cc

feat: Add support for scanning Java packages (intel#1463)

74e251a

terriko requested changes Dec 29, 2021

View reviewed changes

terriko approved these changes Jan 6, 2022

View reviewed changes

terriko mentioned this pull request Jan 6, 2022

Add XML Schema validation to places we use XML #1507

Closed

terriko merged commit 28f9dad into intel:main Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for scanning Java packages (#1463) #1476

feat: Add support for scanning Java packages (#1463) #1476

anthonyharrison commented Dec 23, 2021

codecov-commenter commented Dec 23, 2021 •

edited

Loading

terriko commented Dec 29, 2021

terriko left a comment

terriko Dec 29, 2021

anthonyharrison commented Dec 29, 2021 via email

terriko commented Jan 5, 2022

anthonyharrison commented Jan 5, 2022 via email

terriko commented Jan 5, 2022

anthonyharrison commented Jan 5, 2022 via email

terriko commented Jan 6, 2022

anthonyharrison commented Jan 6, 2022 via email

terriko commented Jan 6, 2022

anthonyharrison commented Jan 6, 2022 via email

terriko left a comment

feat: Add support for scanning Java packages (#1463) #1476

feat: Add support for scanning Java packages (#1463) #1476

Conversation

anthonyharrison commented Dec 23, 2021

codecov-commenter commented Dec 23, 2021 • edited Loading

Codecov Report

terriko commented Dec 29, 2021

terriko left a comment

Choose a reason for hiding this comment

terriko Dec 29, 2021

Choose a reason for hiding this comment

anthonyharrison commented Dec 29, 2021 via email

terriko commented Jan 5, 2022

anthonyharrison commented Jan 5, 2022 via email

terriko commented Jan 5, 2022

anthonyharrison commented Jan 5, 2022 via email

terriko commented Jan 6, 2022

anthonyharrison commented Jan 6, 2022 via email

terriko commented Jan 6, 2022

anthonyharrison commented Jan 6, 2022 via email

terriko left a comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 23, 2021 •

edited

Loading