Skip to content

adding the MPS version of running_compute_processes for >= Volta. #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

devnexen
Copy link
Contributor

@devnexen devnexen commented Jul 2, 2025

No description provided.


nvml_try(sym(self.device, &mut hints, ptr::null_mut()))?;

hints.checked_shr(1);
Copy link
Contributor

@swlynch99 swlynch99 Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a typo. Did you mean to do this instead? I'm not familiar enough with this method to know if we should be shifting the hint flags here.

hints = hints.checked_shr(1);

Copy link
Contributor Author

@devnexen devnexen Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I just forgot to treat this part. You re almost right, since it s the checked flavor it Option<Self> return type

@devnexen devnexen force-pushed the mps_compute_running_processes branch 2 times, most recently from a97ff1b to 3b221e2 Compare July 2, 2025 23:14
Copy link
Contributor

@swlynch99 swlynch99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at bit more closely this seems completely wrong. See my review comments for what needs to be fixed.

The errors here are a bit weird to me. If you're using AI to generate these PRs that's completely fine, but please validate that the resulting code is using the APIs correctly by looking at the docs before submitting them.

Comment on lines 651 to 681
unsafe {
let mut hints: c_uint = 0;

nvml_try(sym(self.device, &mut hints, ptr::null_mut()))?;

let mut hints_shr = match hints.checked_shr(1) {
Some(h) => h,
None => hints,
};
let mut processes: Vec<nvmlProcessInfo_t> = Vec::with_capacity(hints_shr as usize);

nvml_try(sym(self.device, &mut hints_shr, processes.as_mut_ptr()))?;

processes.truncate(hints_shr as usize);
Ok(processes.into_iter().map(ProcessInfo::from).collect())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bit with checked_shr confused me so I went and took a look at the docs to confirm. This section is wrong.

The way this method is supposed to be used seems to be like this:

  • Call the function with count set to 0 and infos set to NULL
  • If the call doesn't error then we're done and know there are no processes, so can return an empty Vec
  • Otherwise it should return NVML_ERROR_INSUFFICIENT_SIZE and update count to the number of processes
  • You should then be using count to allocate an array and passing that in for a second call so that it stores the data into the array.
  • Finally, truncate does not extend vector, you need set_len for that.

So in order:

  • hints is absolutely the wrong variable name. Use len.
  • Drop all the shifting, it makes no sense.
  • You need to handle the relevant errors from the first call to nvmlDeviceGetMPSComputeRunningProcesses_v3. If it succeeds you should just be returning an empty Vec.
  • truncate does not do what you want here, use set_len instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good points !


Supports Volta or newer fully supported devices.
*/
#[doc(alias = "nvmlDeviceGetiMPSComputeRunningProcesses_v3")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a typo here

Suggested change
#[doc(alias = "nvmlDeviceGetiMPSComputeRunningProcesses_v3")]
#[doc(alias = "nvmlDeviceGetMPSComputeRunningProcesses_v3")]

@devnexen devnexen force-pushed the mps_compute_running_processes branch from 3b221e2 to 94f8239 Compare July 3, 2025 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants