-
Notifications
You must be signed in to change notification settings - Fork 106
Add NVSwitch device ID for p6 instance type #2987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,10 +54,12 @@ def _nvidia_driver_version | |
|
||
# Get number of nv switches | ||
def get_nvswitches | ||
# A100 (P4) and H100(P5) systems have NVSwitches | ||
# A100 (P4), H100(P5) and B200(P6) systems have NVSwitches | ||
# NVSwitch device id is 10de:1af1 for P4 instance | ||
# NVSwitch device id is 10de:22a3 for P5 instance | ||
nvswitch_check_p4 = shell_out("lspci -d 10de:1af1 | wc -l") | ||
nvswitch_check_p5 = shell_out("lspci -d 10de:22a3 | wc -l") | ||
nvswitch_check_p4.stdout.strip.to_i + nvswitch_check_p5.stdout.strip.to_i | ||
# NVSwitch device id is 10de:2901 for P6 instance | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where did we take the device id There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This link provides the step to get this ID https://nvidia.custhelp.com/app/answers/detail/a_id/2040/~/identifying-the-graphics-card-model-and-device-id-in-a-pc |
||
# We sum the count for all these deviceIds as output of lscpi command will be >0 | ||
# for only one device ID based on the instance type | ||
nvswitch_device_ids = ['10de:1af1', '10de:22a3', '10de:2901'] | ||
nvswitch_device_ids.sum { |id| shell_out("lspci -d #{id} | wc -l").stdout.strip.to_i } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we summing up all the number of switches rather than returning the specific number for the specific instance type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These device Id's are based on the GPU being used, and the solution is irrespective of the instance type as we use device ID of GPU's for which we know have NVswitches |
||
end |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cover this change within the fabric manager spec test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will try to see if I can for this function as a Unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the unit test!