[no-relnote] Add test to check CDI injection by default #1169

ArangoGutierrez · 2025-06-30T12:19:45Z

Now that the NVIDIA Container Toolkit uses CDI injection by default. This does not trigger the code in the nvidia-container-cli that creates the empty files on the host.

coveralls · 2025-06-30T12:22:01Z

Pull Request Test Coverage Report for Build 16028510793

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 33.21%

Totals
Change from base Build 16002648947:	0.0%
Covered Lines:	4381
Relevant Lines:	13192

💛 - Coveralls

elezar · 2025-06-30T13:05:03Z

tests/e2e/nvidia-container-toolkit_test.go

@@ -240,6 +240,22 @@ var _ = Describe("docker", Ordered, ContinueOnFailure, func() {
 		BeforeAll(func(ctx context.Context) {
 			_, _, err := runner.Run("docker pull ubuntu")
 			Expect(err).ToNot(HaveOccurred())
+
+			err = buildDockerImage(


I think this test should be in its own section. We can then test the different behaviours based on runtime mode. For example if we run docker run --runtime=runc --gpus all we would expect to see an error due to the path traversal.

elezar · 2025-06-30T13:05:33Z

tests/e2e/runner.go

+func buildDockerImage(runner Runner, tag string, args map[string]string, dockerfile string) error {
+	var buildArgs string
+	for k, v := range args {
+		buildArgs += fmt.Sprintf("--build-arg %s=%q ", k, v)
+	}
+
+	command := fmt.Sprintf("docker build -t %s %s - <<EOF\n%s\nEOF", tag, buildArgs, dockerfile)
+
+	_, _, err := runner.Run(command)
+	return err
+}


I think the raw script is easier to follow.

elezar · 2025-06-30T13:06:15Z

tests/e2e/nvidia-container-toolkit_test.go

+			_, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test")
+			Expect(err).To(HaveOccurred())
+
+			containerOutput, _, err := runner.Run("docker run --rm --runtime=runc --gpus=all firmware-test")


This is EXPECTED to fail.

elezar · 2025-06-30T13:06:49Z

tests/e2e/nvidia-container-toolkit_test.go

+			_, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test")
+			Expect(err).To(HaveOccurred())


With the v1.18.0 release this uses CDI by default which means that there will be NO error. This should be a different test case.

elezar · 2025-07-01T15:17:22Z

tests/e2e/nvidia-container-toolkit_test.go

+			Expect(err).ToNot(HaveOccurred())
+		})
+
+		It("should not create empty firmware files on the host when using CDI", func(ctx context.Context) {


This doesn't actually check that the files are not created on this host.

I have now edited the test description to be in line with the context

elezar · 2025-07-01T15:18:36Z

tests/e2e/nvidia-container-toolkit_test.go

+			Expect(output).To(BeEmpty())
+		})
+
+		It("should fail when using the pre-start hook", func(ctx context.Context) {


De we refer to this as "legacy" mode in other tests?

on another open PR we do add a "legacy" label https://github.com/NVIDIA/nvidia-container-toolkit/pull/1168/files#diff-d1fe210afb2df1ba6dd0ebbd1129033e4416de4d8146f52122811c41f081ae75R280

label added

That's not what I asked. In the other test that we explicitly want to trigger the legacy code path we use:

It("should work with nvidia-container-runtime-hook", func(ctx context.Context) {

Let's update this one to be:

It("should fail when using the nvidia-container-runtime-hook", func(ctx context.Context) {

test description edited

elezar · 2025-07-02T08:39:13Z

tests/e2e/nvidia-container-toolkit_test.go

+		It("should not fail when using CDI", func(ctx context.Context) {
+			output, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(output).To(BeEmpty())


Let's update the tests to actually check that no files are created. Using an affected version of the toolkit we see:

$ docker run --rm -ti --gpus=all --runtime=nvidia firmware-test true $ ls gsp_ga10x.bin gsp_tu10x.bin

So we would ensure that ls gsp_*.bin is empty.

elezar · 2025-07-02T08:44:53Z

tests/e2e/nvidia-container-toolkit_test.go

+			_, _, err := runner.Run("docker pull ubuntu")
+			Expect(err).ToNot(HaveOccurred())
+
+			_, _, err = runner.Run(`docker build -t firmware-test --build-arg RM_VERSION="$(basename $(ls -d /lib/firmware/nvidia/*.*))" --build-arg CURRENT_DIR="$(pwd)" - <<EOF


If we set CURRENT_DIR to a tmp folder we can clean it up between each of the test cases to ensure that we're properly testing that we're NOT creating empty files on the host.

elezar · 2025-07-02T08:45:21Z

tests/e2e/nvidia-container-toolkit_test.go

+		It("should fail when using the nvidia-container-runtime-hook", Label("legacy"), func(ctx context.Context) {
+			_, stderr, err := runner.Run("docker run --rm --runtime=runc --gpus=all firmware-test")
+			Expect(err).To(HaveOccurred())
+			Expect(stderr).To(ContainSubstring("nvidia-container-cli.real: mount error: path error:"))


In addition to this we should also check that files are not created.

elezar · 2025-07-02T11:04:41Z

tests/e2e/nvidia-container-toolkit_test.go

+				tmpDir, _, err = runner.Run("mktemp -d")
+				Expect(err).ToNot(HaveOccurred())
+				tmpDir = strings.TrimSpace(tmpDir)


Is there no per-test tmp dir that we can use?

This is what the BeforeEach is doing: it creates a new tmp dir per test (per If block)

elezar · 2025-07-02T11:10:23Z

tests/e2e/nvidia-container-toolkit_test.go

+				Expect(err).ToNot(HaveOccurred())
+				tmpDir = strings.TrimSpace(tmpDir)
+
+				dockerBuildCmd := fmt.Sprintf(`docker build -t firmware-test --build-arg RM_VERSION="$(basename $(ls -d /lib/firmware/nvidia/*.*))" --build-arg CURRENT_DIR=%q - <<EOF`, tmpDir)


Does %q use produce double quotes?

yup -> https://gobyexample.com/string-formatting

"To double-quote strings as in Go source, use %q."

Copilot

Pull Request Overview

This PR updates the test runner to propagate stderr on failures and adds an end-to-end test validating firmware file creation under CDI vs. legacy runtimes.

Propagate actual stderr from runner.Run when a script fails
Add new E2E tests to ensure CDI injection prevents host-side firmware file creation
Import fmt to support formatted commands in tests

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
tests/e2e/runner.go	Return `stderr.String()` as the second value on errors
tests/e2e/nvidia-container-toolkit_test.go	Introduce firmware-file-creation tests with setup and cleanup

Comments suppressed due to low confidence (2)

tests/e2e/nvidia-container-toolkit_test.go:307

This ignores the err return from ls; adding _, _, err := runner.Run(...) followed by Expect(err).ToNot(HaveOccurred()) will ensure any unexpected failures are caught.

			output, _, _ := runner.Run(fmt.Sprintf("ls -A %s", outputDir))

tests/e2e/nvidia-container-toolkit_test.go:277

[nitpick] The variable dockerBuildDockerfile holds the Dockerfile content; renaming it to something like dockerfileContent could improve clarity.

FROM ubuntu

Copilot · 2025-07-02T14:22:34Z

tests/e2e/runner.go

@@ -124,7 +124,7 @@ func (r remoteRunner) Run(script string) (string, string, error) {
 	// Run the script
 	err = session.Run(script)
 	if err != nil {
-		return "", "", fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String())
+		return "", stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String())


[nitpick] Consider returning both stdout and stderr on error (e.g., return stdout.String(), stderr.String(), fmt.Errorf(...)) so callers have full context for debugging.

Suggested change

return "", stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String())

return stdout.String(), stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String())

tests/e2e/nvidia-container-toolkit_test.go

Copilot · 2025-07-02T14:22:35Z

tests/e2e/nvidia-container-toolkit_test.go

+
+		AfterAll(func(ctx context.Context) {
+			if outputDir != "" {
+				runner.Run(fmt.Sprintf("rm -rf %s", outputDir))


The cleanup command's error is ignored; you may want to capture and Expect(err).ToNot(HaveOccurred()) to surface any failures during teardown.

Suggested change

runner.Run(fmt.Sprintf("rm -rf %s", outputDir))

_, _, err := runner.Run(fmt.Sprintf("rm -rf %s", outputDir))

Expect(err).ToNot(HaveOccurred())

elezar · 2025-07-02T14:47:39Z

tests/e2e/nvidia-container-toolkit_test.go

@@ -257,4 +258,54 @@ var _ = Describe("docker", Ordered, ContinueOnFailure, func() {
 			Expect(libs).To(ContainElements([]string{"libcuda.so", "libcuda.so.1"}))
 		})
 	})
+
+	When("A container tries to create firmware files on the host", Ordered, func() {


Maybe:

Suggested change

When("A container tries to create firmware files on the host", Ordered, func() {

When("Running a container where the firmware folder resolves outside the container root", Ordered, func() {

elezar · 2025-07-02T14:48:10Z

tests/e2e/nvidia-container-toolkit_test.go

+	When("A container tries to create firmware files on the host", Ordered, func() {
+		var outputDir string
+		BeforeAll(func(ctx context.Context) {
+			pwd, _, err := runner.Run("pwd")


Why not use the os package?

pwd, err := os.Getwd() Expect(err).ToNot(HaveOccurred()) workingDir := filepath.Join(pwd, "test-output") err = os.MkdirAll(workingDir, 0755) Expect(err).ToNot(HaveOccurred())

Remember, we are running this over SSH, since the runner interface is designed to be agnostic, we can not use Go packages for system-level actions, we need to use bash via the runner

Now that the NVIDIA Container Toolkit uses CDI injection by default. This does not trigger the code in the nvidia-container-cli that creates the empty files on the host. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

ArangoGutierrez requested review from elezar and Copilot June 30, 2025 12:19

ArangoGutierrez self-assigned this Jun 30, 2025

This comment was marked as outdated.

Sign in to view

ArangoGutierrez force-pushed the b/5366608 branch 2 times, most recently from 80b5ea1 to 13ca3e3 Compare June 30, 2025 13:04

elezar reviewed Jun 30, 2025

View reviewed changes

ArangoGutierrez force-pushed the b/5366608 branch 7 times, most recently from 8778e11 to f47a6be Compare July 1, 2025 07:45

ArangoGutierrez requested review from elezar and Copilot July 1, 2025 08:00

This comment was marked as outdated.

Sign in to view

elezar reviewed Jul 1, 2025

View reviewed changes

ArangoGutierrez force-pushed the b/5366608 branch from f47a6be to c2fa0b3 Compare July 1, 2025 15:32

ArangoGutierrez requested a review from elezar July 1, 2025 15:32

ArangoGutierrez force-pushed the b/5366608 branch from c2fa0b3 to 1499ac2 Compare July 2, 2025 07:04

elezar reviewed Jul 2, 2025

View reviewed changes

ArangoGutierrez force-pushed the b/5366608 branch from 1499ac2 to 18138b8 Compare July 2, 2025 09:05

ArangoGutierrez requested review from elezar and Copilot July 2, 2025 10:21

This comment was marked as outdated.

Sign in to view

elezar reviewed Jul 2, 2025

View reviewed changes

ArangoGutierrez force-pushed the b/5366608 branch 3 times, most recently from 49f57d6 to 244939c Compare July 2, 2025 11:44

ArangoGutierrez requested review from elezar and Copilot July 2, 2025 14:20

Copilot AI reviewed Jul 2, 2025

View reviewed changes

elezar reviewed Jul 2, 2025

View reviewed changes

ArangoGutierrez force-pushed the b/5366608 branch from 244939c to e885760 Compare July 2, 2025 14:52

ArangoGutierrez requested a review from elezar July 2, 2025 14:56

		_, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test")
		Expect(err).To(HaveOccurred())

	return "", stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String())
	return stdout.String(), stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String())

	runner.Run(fmt.Sprintf("rm -rf %s", outputDir))
	_, _, err := runner.Run(fmt.Sprintf("rm -rf %s", outputDir))
	Expect(err).ToNot(HaveOccurred())

	When("A container tries to create firmware files on the host", Ordered, func() {
	When("Running a container where the firmware folder resolves outside the container root", Ordered, func() {

[no-relnote] Add test to check CDI injection by default #1169

Are you sure you want to change the base?

[no-relnote] Add test to check CDI injection by default #1169

Conversation

ArangoGutierrez commented Jun 30, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

coveralls commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16028510793

Details

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented Jun 30, 2025 •

edited

Loading