Skip to content

Refactor cdi api #1166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Refactor cdi api #1166

wants to merge 3 commits into from

Conversation

elezar
Copy link
Member

@elezar elezar commented Jun 27, 2025

The original CDI spec generation API was focussed on NVML device specifically. Since then we have replaced the more specific functions (for GPU and MIG devices) in the API with more generally applicable functions based on mode and device IDs.

This organic growth of APIs also means that for the NVML case specifically we had multiple different implementations of CDI spec generation making keeping things consistent more difficult.

Thes changes remove the redundant functions in the nvcdi.Interface allowing devices to be requested by ID across all use cases. It also refactors the CDI spec generation for NVML devices to ensure that the same generation logic is used for all cases.

@elezar elezar force-pushed the refactor-cdi-api branch from 7389c4f to 4656c25 Compare June 27, 2025 12:46
@coveralls
Copy link

coveralls commented Jun 27, 2025

Pull Request Test Coverage Report for Build 16001110444

Details

  • 112 of 248 (45.16%) changed or added relevant lines in 12 files are covered.
  • 9 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.9%) to 34.13%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/nvcdi/lib-wsl.go 0 3 0.0%
pkg/nvcdi/gds.go 0 4 0.0%
pkg/nvcdi/lib-imex.go 41 45 91.11%
pkg/nvcdi/management.go 0 4 0.0%
pkg/nvcdi/mofed.go 0 4 0.0%
pkg/nvcdi/wrapper.go 12 17 70.59%
pkg/nvcdi/lib.go 3 9 33.33%
pkg/nvcdi/lib-csv.go 0 10 0.0%
pkg/nvcdi/full-gpu-nvml.go 8 26 30.77%
pkg/nvcdi/mig-device-nvml.go 14 33 42.42%
Files with Coverage Reduction New Missed Lines %
pkg/nvcdi/lib-wsl.go 1 0.0%
pkg/nvcdi/mig-device-nvml.go 1 25.0%
pkg/nvcdi/lib-imex.go 2 82.76%
pkg/nvcdi/wrapper.go 2 70.21%
pkg/nvcdi/lib-nvml.go 3 42.48%
Totals Coverage Status
Change from base Build 15971468852: 0.9%
Covered Lines: 4485
Relevant Lines: 13141

💛 - Coveralls

Copilot

This comment was marked as outdated.

@elezar elezar force-pushed the refactor-cdi-api branch 3 times, most recently from 0a2b879 to a6f8a10 Compare July 1, 2025 13:18
@elezar elezar added this to the v1.18.0 milestone Jul 1, 2025
@elezar elezar marked this pull request as ready for review July 1, 2025 13:19
@elezar elezar requested a review from cdesiniotis July 1, 2025 13:28
@elezar elezar force-pushed the refactor-cdi-api branch from a6f8a10 to 2683a6b Compare July 1, 2025 13:30
elezar added 3 commits July 1, 2025 15:38
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the refactor-cdi-api branch from 2683a6b to 7753402 Compare July 1, 2025 13:43
@ArangoGutierrez ArangoGutierrez requested a review from Copilot July 1, 2025 16:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Refactors the CDI API to use a unified factory-based SpecGenerator pattern and consolidates NVML spec generation logic.

  • Introduce deviceSpecGeneratorFactory, SpecGenerator, and DeviceSpecGenerator types and update all implementations accordingly
  • Replace redundant per-device methods with fullGPUDeviceSpecGenerator / migDeviceSpecGenerator and a combined DeviceSpecGenerators type
  • Update wrapper and CLI to use GetDeviceSpecsByID("all") and deprecate GetAllDeviceSpecs

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/nvcdi/wrapper.go Added factory field and new SpecGenerator types
pkg/nvcdi/mofed.go Implement factory interface for MOFED
pkg/nvcdi/mig-device-nvml.go Extracted MIG spec generator type
pkg/nvcdi/management.go Implement factory interface for management mode
pkg/nvcdi/lib.go Wire deviceSpecGeneratorFactory into New
pkg/nvcdi/lib-wsl.go Implement factory interface for WSL mode
pkg/nvcdi/lib-nvml_test.go Add tests for NVML DeviceSpecGenerators
pkg/nvcdi/lib-nvml.go Major refactor: split Init/Shutdown and generators
pkg/nvcdi/lib-imex.go Implement factory interface for IMEX
pkg/nvcdi/lib-csv.go Implement factory interface for CSV mode
pkg/nvcdi/gds.go Implement factory interface for GDS mode
pkg/nvcdi/full-gpu-nvml.go Extracted full GPU spec generator type
pkg/nvcdi/api.go Updated Interface to use SpecGenerator
cmd/nvidia-ctk/cdi/generate/generate.go Replace deprecated GetAllDeviceSpecs call
Comments suppressed due to low confidence (4)

pkg/nvcdi/lib-nvml.go:108

  • [nitpick] The local variable DeviceSpecGenerators shadows the type DeviceSpecGenerators, which may confuse readers. Rename the variable (e.g., generators) to avoid shadowing.
	var DeviceSpecGenerators DeviceSpecGenerators

pkg/nvcdi/wrapper.go:39

  • [nitpick] Add a Go doc comment for deviceSpecGeneratorFactory to explain its role in producing DeviceSpecGenerator instances.
// TODO: Rename this type

pkg/nvcdi/lib-nvml.go:47

  • [nitpick] Expand this comment to specify the exact ID formats supported (e.g., gpuIndex, uuid, or gpuIndex:migIndex for MIG devices) so consumers know how to request each device.
// DeviceSpecGenerators returns the CDI device spec generators for NVML devices

pkg/nvcdi/lib-nvml_test.go:37

  • Consider adding a test case for invalid device IDs (e.g., an unsupported string) to verify that getDeviceSpecGeneratorsForIDs returns an appropriate error.
		expectedLength     int


var _ DeviceSpecGenerator = (*migDeviceSpecGenerator)(nil)

func (l *nvmllib) newMIGDeviceSpecGeneratorFromNVMLDevice(id string, nvmlDevice nvml.Device) (DeviceSpecGenerator, error) {
Copy link
Preview

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields index and migIndex on migDeviceSpecGenerator are never initialized, causing getNames() to always use zero indices. Pass and set the correct index and migIndex when constructing the generator.

Suggested change
func (l *nvmllib) newMIGDeviceSpecGeneratorFromNVMLDevice(id string, nvmlDevice nvml.Device) (DeviceSpecGenerator, error) {
func (l *nvmllib) newMIGDeviceSpecGeneratorFromNVMLDevice(id string, nvmlDevice nvml.Device, index int, migIndex int) (DeviceSpecGenerator, error) {

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants