-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for scanning Java packages (#1463) #1476
Conversation
Resync repository
Codecov Report
@@ Coverage Diff @@
## main #1476 +/- ##
==========================================
- Coverage 82.32% 81.57% -0.75%
==========================================
Files 279 279
Lines 5454 5509 +55
Branches 884 900 +16
==========================================
+ Hits 4490 4494 +4
- Misses 774 824 +50
- Partials 190 191 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I swear, holiday time is the worst time for CI infrastructure bugs. Looks like the CI run had some network issue, I'm going to try running it again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, just one question about whether we can reasonably do some additional validation on the xml.
|
||
def run_java_checker(self, filename, lines): | ||
"""Process maven pom.xml file and extract product and dependency details""" | ||
tree = ET.parse(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does maven have any sort of xmlschema we should be using to verify that the data is valid? I feel like it should have a published one we could use (and possibly store locally in our source tree to avoid the network connection), but I won't be surprised if that's not the case.
Terri
There is a schema https://maven.apache.org/xsd/maven-4.0.0.xsd . Is there
something we need to add to ensure that the XML document is valid before we
start processing it? May have to do something similar for the SBOM XML
files....
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
…On Wed, 29 Dec 2021 at 21:12, Terri Oda ***@***.***> wrote:
***@***.**** requested changes on this pull request.
This looks good to me, just one question about whether we can reasonably
do some additional validation on the xml.
------------------------------
In cve_bin_tool/version_scanner.py
<#1476 (comment)>:
> + self.logger.debug(f"Try alternative product {product}")
+ # Remove parent appendage
+ if "-parent" in product:
+ product = product.replace("-parent", "")
+ product = product.replace("-", "_")
+ vendor_package_pair = self.cve_db.get_vendor_product_pairs(product)
+ if vendor_package_pair != []:
+ vendor = vendor_package_pair[0]["vendor"]
+ file_path = "".join(self.file_stack)
+ self.logger.debug(f"{file_path} {product} {version} by {vendor}")
+ return ProductInfo(vendor, product, version), file_path
+ return None, None
+
+ def run_java_checker(self, filename, lines):
+ """Process maven pom.xml file and extract product and dependency details"""
+ tree = ET.parse(filename)
Does maven have any sort of xmlschema we should be using to verify that
the data is valid? I feel like it *should* have a published one we could
use (and possibly store locally in our source tree to avoid the network
connection), but I won't be surprised if that's not the case.
—
Reply to this email directly, view it on GitHub
<#1476 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID2YMRM2SFOSN52ZOJJTUTN2TRANCNFSM5KVOARRQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
It looks like there's an xmlschema package similar to the jsonschema one we already use: https://xmlschema.readthedocs.io/en/latest/usage.html It looks like it's also an option in lxml, but I believe bandit will raise issues if we use lxml directly. I don't see anything when I search for "schema" in defusedxml.ElementTree so I'm guessing it doesn't have a similar function though you'd think that'd be a nice addition. If we can validate I feel like it's good practice to do so, although I do wonder if we'll find that the XML is frequently non-compliant in practice. Only one way to find out, though! |
Terri
I will have a look. I agree that defusedxml doesn't have an equivalent
function although lxml is used 'under the hood' by defusedxml. Will try and
write a small test script to see what happens when I validate using the XML
schema component and also see what Bandit says!
Regards Anthony
…On Wed, 5 Jan 2022, 01:15 Terri Oda, ***@***.***> wrote:
It looks like there's an xmlschema package similar to the jsonschema one
we already use: https://xmlschema.readthedocs.io/en/latest/usage.html
It looks like it's also an option in lxml, but I believe bandit will raise
issues if we use lxml directly. I don't see anything when I search for
"schema" in defusedxml.ElementTree so I'm guessing it doesn't have a
similar function though you'd think that'd be a nice addition.
If we can validate I feel like it's good practice to do so, although I do
wonder if we'll find that the XML is frequently non-compliant in practice.
Only one way to find out, though!
—
Reply to this email directly, view it on GitHub
<#1476 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID2YZWXKS6BQKVTKHFGDUUOLT5ANCNFSM5KVOARRQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
BTW, the other thing our xml security folk recommend is forcing the use of an encoding. I can't find anything specifically designed to do this in the xml libraries, but we probably could force one with python's usual |
Terri
Done a bit of experimenting and I think I may have a solution.
The following code seems to do the job
import xmlschema
def validate_xml (filename, xsd_file):
theschema = xmlschema.XMLSchema(xsd_file)
try:
result = theschema.validate(filename)
except Exception:
result = "Fail"
return result is None
filename="cyclonedx_test.xml"
xsd_file="https://cyclonedx.org/schema/bom/1.3"
if validate_xml(filename, xsd_file):
print (f"Let's process {filename}")
else:
print (f"Oh dear! {filename} is not a valid XML file")
And when I run bandit, I don't get any issues.
A mal-formed XML file (e.g. a typo in one of the tags) will result in
validation failure. I also noticed that the parse function of defusedxml
will raise an exception when the parse function is called if the file is of
an invalid format so we were probably already covered, albeit in an untidy
way.
Going forward, I think I need to add the XML validate function to the utils
package and add something to the test suite (SBOM). However, we also need
to work out what we do with the schemas. I assume that we should have local
copies of the schemas so that we can do the validation when in offline
mode. However, do you think it would be a good idea to grab copies of the
schemas from the appropriate repos as part of the install process and then
store them in a local area rather than having them locally within our
repo?. Where do you think they should be stored? In the .cache directory
(e.g. a subdirectory of .cache/cve-bin-tool/schemas) or as part of the
Python installation?
A bit more work for me to do but probably not for the next few days as I am
travelling on business and I have to tidy up the FOSDEM slides before the
weekend.
Regarding the encoding suggestion, is this what is required?
replace
tree = ET.parse(filename)
with
xmlparser=ET.XMLParser(encoding="utf-8")
tree = ET.parse(filename, parser=xmlparser)
Regards
Anthony
…On Wed, 5 Jan 2022 at 18:44, Terri Oda ***@***.***> wrote:
BTW, the other thing our xml security folk recommend is forcing the use of
an encoding. I can't find anything specifically designed to do this in the
xml libraries, but we probably could force one with python's usual
.encode() if we wanted to be explicit but I haven't looked into whether
that's needed or it's already being implicitly done under the hood.
—
Reply to this email directly, view it on GitHub
<#1476 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID25HRUALFX46UWMNCWDUUSGRNANCNFSM5KVOARRQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I don't think any of these schemas change super rapidly the way the nvd data does. If that's true, and also they're not huge and there's no licensing issues (I'm not even sure if you can license a schema or if that's considered like an API... might have to do some research) then we'd probably save a lot of time for users if we store them in the repo somewhere have an update script that could be run by the user but mostly is used by Github Actions to do a regular auto-update. |
Hi Terri
I will have a look at assembling a set of schemas. Think I need a schema
for Spdx and cyclonedx SBOMs and one for the maven pom file. I assume that
we will need to update the setup script to copy the files from the repo
Regards Anthony
…On Thu, 6 Jan 2022, 18:33 Terri Oda, ***@***.***> wrote:
I don't think any of these schemas change super rapidly the way the nvd
data does. If that's true, and also they're not huge and there's no
licensing issues (I'm not even sure if you can license a schema or if
that's considered like an API... might have to do some research) then we'd
probably save a lot of time for users if we store them in the repo
somewhere have an update script that could be run by the user but mostly is
used by Github Actions to do a regular auto-update.
—
Reply to this email directly, view it on GitHub
<#1476 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID2ZCYHOD4JQ2NYUT4P3UUXN7DANCNFSM5KVOARRQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
And this is all starting to sound like a whole separate feature. Given that it looks like defusedxml will fail correctly (but not elegantly) on malformed data I don't think there's a huge security risk if we put the schema validation part into a separate PR if you want to have me merge this one and iterate while I figure out the licensing implications. Let me know! |
Terri
I agree that the validation of XML is a separate feature as it is not just
related to the java scanner.
Anthony
…On Thu, 6 Jan 2022, 18:40 Terri Oda, ***@***.***> wrote:
And this is all starting to sound like a whole separate feature. Given
that it looks like defusedxml will fail correctly (but not elegantly) on
malformed data I don't think there's a huge security risk if we put the
schema validation part into a separate PR if you want to have me merge this
one and iterate while I figure out the licensing implications. Let me know!
—
Reply to this email directly, view it on GitHub
<#1476 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID27M2MIH6JXVXATKH33UUXO3RANCNFSM5KVOARRQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's get this one merged and I'll open up a separate issue about improving xml schema validation.
No description provided.