-
Notifications
You must be signed in to change notification settings - Fork 365
[no-relnote] Add test to check CDI injection by default #1169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 16028510793Details
💛 - Coveralls |
80b5ea1
to
13ca3e3
Compare
@@ -240,6 +240,22 @@ var _ = Describe("docker", Ordered, ContinueOnFailure, func() { | |||
BeforeAll(func(ctx context.Context) { | |||
_, _, err := runner.Run("docker pull ubuntu") | |||
Expect(err).ToNot(HaveOccurred()) | |||
|
|||
err = buildDockerImage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test should be in its own section. We can then test the different behaviours based on runtime mode. For example if we run docker run --runtime=runc --gpus all
we would expect to see an error due to the path traversal.
tests/e2e/runner.go
Outdated
func buildDockerImage(runner Runner, tag string, args map[string]string, dockerfile string) error { | ||
var buildArgs string | ||
for k, v := range args { | ||
buildArgs += fmt.Sprintf("--build-arg %s=%q ", k, v) | ||
} | ||
|
||
command := fmt.Sprintf("docker build -t %s %s - <<EOF\n%s\nEOF", tag, buildArgs, dockerfile) | ||
|
||
_, _, err := runner.Run(command) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the raw script is easier to follow.
_, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test") | ||
Expect(err).To(HaveOccurred()) | ||
|
||
containerOutput, _, err := runner.Run("docker run --rm --runtime=runc --gpus=all firmware-test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is EXPECTED to fail.
_, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test") | ||
Expect(err).To(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the v1.18.0 release this uses CDI by default which means that there will be NO error. This should be a different test case.
8778e11
to
f47a6be
Compare
Expect(err).ToNot(HaveOccurred()) | ||
}) | ||
|
||
It("should not create empty firmware files on the host when using CDI", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually check that the files are not created on this host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now edited the test description to be in line with the context
Expect(output).To(BeEmpty()) | ||
}) | ||
|
||
It("should fail when using the pre-start hook", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
De we refer to this as "legacy" mode in other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on another open PR we do add a "legacy" label https://github.com/NVIDIA/nvidia-container-toolkit/pull/1168/files#diff-d1fe210afb2df1ba6dd0ebbd1129033e4416de4d8146f52122811c41f081ae75R280
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I asked. In the other test that we explicitly want to trigger the legacy code path we use:
It("should work with nvidia-container-runtime-hook", func(ctx context.Context) {
Let's update this one to be:
It("should fail when using the nvidia-container-runtime-hook", func(ctx context.Context) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test description edited
f47a6be
to
c2fa0b3
Compare
c2fa0b3
to
1499ac2
Compare
It("should not fail when using CDI", func(ctx context.Context) { | ||
output, _, err := runner.Run("docker run --rm --runtime=nvidia --gpus=all firmware-test") | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(output).To(BeEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update the tests to actually check that no files are created. Using an affected version of the toolkit we see:
$ docker run --rm -ti --gpus=all --runtime=nvidia firmware-test true
$ ls
gsp_ga10x.bin gsp_tu10x.bin
So we would ensure that ls gsp_*.bin
is empty.
_, _, err := runner.Run("docker pull ubuntu") | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
_, _, err = runner.Run(`docker build -t firmware-test --build-arg RM_VERSION="$(basename $(ls -d /lib/firmware/nvidia/*.*))" --build-arg CURRENT_DIR="$(pwd)" - <<EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set CURRENT_DIR
to a tmp folder we can clean it up between each of the test cases to ensure that we're properly testing that we're NOT creating empty files on the host.
It("should fail when using the nvidia-container-runtime-hook", Label("legacy"), func(ctx context.Context) { | ||
_, stderr, err := runner.Run("docker run --rm --runtime=runc --gpus=all firmware-test") | ||
Expect(err).To(HaveOccurred()) | ||
Expect(stderr).To(ContainSubstring("nvidia-container-cli.real: mount error: path error:")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to this we should also check that files are not created.
1499ac2
to
18138b8
Compare
tmpDir, _, err = runner.Run("mktemp -d") | ||
Expect(err).ToNot(HaveOccurred()) | ||
tmpDir = strings.TrimSpace(tmpDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no per-test tmp dir that we can use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the BeforeEach
is doing: it creates a new tmp dir per test (per If
block)
Expect(err).ToNot(HaveOccurred()) | ||
tmpDir = strings.TrimSpace(tmpDir) | ||
|
||
dockerBuildCmd := fmt.Sprintf(`docker build -t firmware-test --build-arg RM_VERSION="$(basename $(ls -d /lib/firmware/nvidia/*.*))" --build-arg CURRENT_DIR=%q - <<EOF`, tmpDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does %q
use produce double quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup -> https://gobyexample.com/string-formatting
"To double-quote strings as in Go source, use %q."
49f57d6
to
244939c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the test runner to propagate stderr on failures and adds an end-to-end test validating firmware file creation under CDI vs. legacy runtimes.
- Propagate actual stderr from
runner.Run
when a script fails - Add new E2E tests to ensure CDI injection prevents host-side firmware file creation
- Import
fmt
to support formatted commands in tests
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
tests/e2e/runner.go | Return stderr.String() as the second value on errors |
tests/e2e/nvidia-container-toolkit_test.go | Introduce firmware-file-creation tests with setup and cleanup |
Comments suppressed due to low confidence (2)
tests/e2e/nvidia-container-toolkit_test.go:307
- This ignores the
err
return fromls
; adding_, _, err := runner.Run(...)
followed byExpect(err).ToNot(HaveOccurred())
will ensure any unexpected failures are caught.
output, _, _ := runner.Run(fmt.Sprintf("ls -A %s", outputDir))
tests/e2e/nvidia-container-toolkit_test.go:277
- [nitpick] The variable
dockerBuildDockerfile
holds the Dockerfile content; renaming it to something likedockerfileContent
could improve clarity.
FROM ubuntu
@@ -124,7 +124,7 @@ func (r remoteRunner) Run(script string) (string, string, error) { | |||
// Run the script | |||
err = session.Run(script) | |||
if err != nil { | |||
return "", "", fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String()) | |||
return "", stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider returning both stdout and stderr on error (e.g., return stdout.String(), stderr.String(), fmt.Errorf(...)
) so callers have full context for debugging.
return "", stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String()) | |
return stdout.String(), stderr.String(), fmt.Errorf("script execution failed: %v\nSTDOUT: %s\nSTDERR: %s", err, stdout.String(), stderr.String()) |
Copilot uses AI. Check for mistakes.
|
||
AfterAll(func(ctx context.Context) { | ||
if outputDir != "" { | ||
runner.Run(fmt.Sprintf("rm -rf %s", outputDir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup command's error is ignored; you may want to capture and Expect(err).ToNot(HaveOccurred())
to surface any failures during teardown.
runner.Run(fmt.Sprintf("rm -rf %s", outputDir)) | |
_, _, err := runner.Run(fmt.Sprintf("rm -rf %s", outputDir)) | |
Expect(err).ToNot(HaveOccurred()) |
Copilot uses AI. Check for mistakes.
@@ -257,4 +258,54 @@ var _ = Describe("docker", Ordered, ContinueOnFailure, func() { | |||
Expect(libs).To(ContainElements([]string{"libcuda.so", "libcuda.so.1"})) | |||
}) | |||
}) | |||
|
|||
When("A container tries to create firmware files on the host", Ordered, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
When("A container tries to create firmware files on the host", Ordered, func() { | |
When("Running a container where the firmware folder resolves outside the container root", Ordered, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopted
When("A container tries to create firmware files on the host", Ordered, func() { | ||
var outputDir string | ||
BeforeAll(func(ctx context.Context) { | ||
pwd, _, err := runner.Run("pwd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the os
package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pwd, err := os.Getwd()
Expect(err).ToNot(HaveOccurred())
workingDir := filepath.Join(pwd, "test-output")
err = os.MkdirAll(workingDir, 0755)
Expect(err).ToNot(HaveOccurred())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember, we are running this over SSH, since the runner
interface is designed to be agnostic, we can not use Go packages for system-level actions, we need to use bash via the runner
Now that the NVIDIA Container Toolkit uses CDI injection by default. This does not trigger the code in the nvidia-container-cli that creates the empty files on the host. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
244939c
to
e885760
Compare
Now that the NVIDIA Container Toolkit uses CDI injection by default. This does not trigger the code in the nvidia-container-cli that creates the empty files on the host.