-
Notifications
You must be signed in to change notification settings - Fork 2k
WSL2 CUDA Driver 465.42 not working with Nvidia’s CUDA 11.1 and higher #1458
Comments
@dualvtable Is this related to the issue with forward-compatibility not working in certain container images? |
I ran into this issue as well, seems related to NVIDIA/nvidia-container-toolkit#148 as *edit: This is likely a bug in detecting the driver version, since I was able to work around this issue by disabling requirements checking ( user@host:/usr/local/cuda/samples/1_Utilities/deviceQuery$ docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
Thu Feb 11 23:17:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.00 Driver Version: 465.42 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 21% 56C P0 43W / 215W | 853MiB / 8192MiB | ERR! Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A |
| 0% 45C P8 8W / 215W | 113MiB / 8192MiB | ERR! Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ |
Hey @feynmanliang @levipereira. Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL. For reference: here is the merge request extending WSL support. |
Thanks for the info. We were facing the same issue since January. Looking forward for the patch. |
@elezar I saw the fix has been internally approved. Does it mean it will ship with some CUDA 11.2 Update 2 or something? If so, have you any idea when will it be released? Thanks :) |
Hi @willemavjc. This will ship with the next version of the nvidia-container-toolkit that goes out. We are working on getting a release out, but don't yet have a date for it. The release process is separate from the CUDA release process. |
@elezar Is this in anyway related to the Windows NVIDIA driver superseding the WSL-CUDA driver? Are we closing in on an official release? |
The Insiders Preview 470.05 driver has some issues (requires disable/reenable the Display Adapter in Device Manager before |
@feynmanliang I am... you could say very aware of that post. Like the most aware of anyone on the planet. I could not possibly be more aware if I had written it. |
CUDA 11.2 Update 2 is out. See release notes here https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html Not tested yet but eager to check if the fix pointed by @elezar is included: |
From what I'm seeing, this is not fixed in CUDA 11.2 Update 2 -
@elezar mentioned nvidia-container-toolkit, which had it's last release on Feb 5th - https://github.com/NVIDIA/nvidia-container-toolkit/releases If I understand this correctly, first nvidia-container-toolkit must release, then the drivers need to be updated to use it? |
It works if you use following Flag |
Ah, that it does! Interesting... Now I just need to figure out how to build my own docker containers that take the drivers from the devel or runtime containers... |
We have not yet released the fix mentioned in the comments above and will post here once this is done. Using |
Pretty easy in fact. We did to match our own "security" rules here. Have a look on the 3 Dockerfile and grab the libs from them. Add them to whatever is your OS container. Works like a charm. |
is this bug fixed ? I'm not even able to run this docker container that should work : root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 Unable to find image 'nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04' locally in addition : root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# nvidia-smi Failed to properly shut down NVML: Driver Not Loaded (I'm using windows 10 build 21376co_release.210503-1432 on the host I have installed the nvidia driver vers. 470.14 and inside WSL I have installed ubuntu 20.04 with the cuda-toolkit-11-0. This is the tutorial that I have read : root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# docker --version |
@Marietto2008 there is an issue with NVML in the 470 driver (#1496 (comment)) that will be addressed soon. |
Driver 471.11 fixed the problems! |
1. Issue or feature description
The title explains most of the issue, but to give more context, my windows insider build is Build 21301.rs_prerelease.210123-1645
My nvidia drivers are updated to the latest as well on driver version 465.42:
I also have the latest Docker Desktop with WSL2 Integration for the GPU!
2. Steps to reproduce the issue
When I launch this, it works:
docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04
BUT, when I launch this:
docker run --rm --gpus all nvidia/cuda:11.1.1-cudnn8-devel-ubuntu18.04
I get this:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container: unknown.
https://forums.developer.nvidia.com/t/wsl2-cuda-driver-465-42-not-working-with-nvidias-cuda-11-1-1-docker-containers/167662
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
uname -a
dmesg
nvidia-smi -a
docker version
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
nvidia-container-cli -V
The text was updated successfully, but these errors were encountered: