nvidia-container-cli reports incorrect CUDA driver version on WSL2 #148

danfairs · 2020-11-08T11:25:01Z

1. Issue or feature description

nvidia-container-cli on WSL2 is reporting CUDA 11.0 (and thus refusing to run containers with cuda>=11.1) even though CUDA toolkit 11.1 is installed in Linux. Windows 10 is build 20251.fe_release.201030-1438. Everything is installed as per the install guide, and CUDA containers do actually work (for example docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark successfully returns a benchmark).

Machine is a Dell XPS 15 9500 with an i9-10885H CPU, 64 GB RAM and an NVIDIA GeForce GTX 1650 Ti.

2. Steps to reproduce the issue

Install Windows 10 on the insider program with a version at or later than 20251.fe_release.201030-1438
Install the Windows CUDA drivers from here (this is 460.20 for me)
Install Ubuntu 20.04, the CUDA toolkit 11.1 and the container runtime as per the nvidia docs
Run nvidia-smi on the host - it should give a CUDA version of 11.2.
Check docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark correctly outputs benchmarks
In Linux, run nvidia-container-cli info. It incorrectly outputs CUDA version 11.0.

This command will also fail:

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

3. Information to attach (optional if deemed irrelevant)

Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info ncc.txt
Kernel version from uname -a Linux aphid 5.4.72-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Any relevant kernel output lines from dmesg
Driver information from nvidia-smi -a nvidia-smi.txt
Docker version from docker version 19.03.13
NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*' packages.txt
NVIDIA container library version from nvidia-container-cli -V ncc-version.txt
NVIDIA container library logs (see troubleshooting)
Docker command, image and tag used

$ docker run --gpus all --rm -it nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 /bin/bash 2>&1 docker-run.txt
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.1, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown.

The text was updated successfully, but these errors were encountered:

opptimus · 2020-11-12T02:56:49Z

The same with me

Status: Downloaded newer image for nvidia/cuda:10.2-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.

klueska · 2020-11-12T09:49:07Z

@opptimus seems to have a different issue, but the original issue may be related to:
NVIDIA/libnvidia-container#117 (comment)

danfairs · 2020-11-12T13:57:52Z

@klueska To be fair, @opptimus' issue is the one I actually bumped into to start with. It was only after further digging I realised nvidia-container-cli was also reporting the wrong version. I may be getting the cart before the horse, I'm pretty new to this :)

opptimus · 2020-11-20T11:54:06Z

@danfairs I solve my problems with upgrading my Win10 to version 20257.1. Follow official WSL2 guidelines.

elezar · 2021-02-12T09:44:41Z

Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.

In the meantime you could use the NVIDIA_DISABLE_REQUIRE environment to skip the CUDA version check.

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

For reference: here is the merge request extending WSL support.

archee8 · 2021-05-04T20:16:40Z

Hey @danfairs . Thanks for reporting the issue. We have a fix in progress to address the fact that we report CUDA version 11.0 on WSL.

In the meantime you could use the NVIDIA_DISABLE_REQUIRE environment to skip the CUDA version check.
docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
For reference: here is the merge request extending WSL support.

Hi. I have some problem with nvidia-container-cli. I run this

archee8@DESKTOP-HR2MA0D:~$ docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

elezar · 2021-05-05T05:03:08Z

@archee8 which version of the NVIDIA container toolkit is this?

The version 1.4.0 of libnvidia-container should address this issue.

archee8 · 2021-05-05T08:57:35Z

@ archee8 какая это версия инструментария контейнера NVIDIA?

Версия 1.4.0 libnvidia-containerдолжна решить эту проблему.

archee8@DESKTOP-HR2MA0D:~$ sudo apt-cache policy libnvidia-container-tools
libnvidia-container-tools:
  Installed: 1.4.0-1

klueska · 2021-05-05T09:43:20Z

@archee8 Your issue appears to be related to this:
NVIDIA/nvidia-docker#1496 (comment)

Keiku · 2022-03-24T04:31:44Z

The following command works, but it doesn't work with docker-compose. Does anyone know the cause?

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 nvidia-smi

I have the following environment. The reason for Ubuntu 16.04 is that it cannot be upgraded due to company security issues.

⋊> ~ lsb_release -a                                                                                                                                                13:29:20
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.7 LTS
Release:        16.04
Codename:       xenial
⋊> ~ docker --version                                                                                                                                              13:29:20
Docker version 20.10.7, build f0df350
⋊> ~ docker-compose --version                                                                                                                                      13:29:38
docker-compose version 1.29.2, build unknown
⋊> ~ nvidia-container-cli info                                                                                                                                     13:30:27
NVRM version:   440.118.02
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          TITAN X (Pascal)
Brand:          GeForce
GPU UUID:       GPU-fcae2b3c-b6c0-c0c6-1eef-4f25809d16f9
Bus Location:   00000000:01:00.0
Architecture:   6.1
⋊> ~

andresgalaviz · 2022-04-01T00:55:14Z

This issue is still present when following the current instructions on the official nvidia documentation for this: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#ch05-running-containers

psychofisch · 2022-06-13T21:37:49Z

While trying to run https://github.com/borisdayma/dalle-mini in WSL2 I encountered the same error message as @danfairs

root@DESKTOP-DEADBEEF:/mnt/g/github/dalle-mini# docker run --rm --name dallemini --gpus all -it -p 8888:88
88 -v "${PWD}":/workspace dalle-mini:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a newer version, or use an earlier cuda container: unknown.

When I check my currently installed version with nvidia-smi I see that I have version 11.7 installed (the error meesage above requires 11.6):

root@DESKTOP-DEADBEEF:/mnt/g/github/dalle-mini# nvidia-smi
Mon Jun 13 23:34:16 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 516.01       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:26:00.0  On |                  N/A |
|  0%   38C    P8     8W / 175W |   1082MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I'm kinda stuck right now. Any advice?

elezar · 2022-06-14T09:01:04Z

@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true:

docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest

TheFrator · 2022-09-21T21:20:14Z

@psychofisch as a workaround please start the container with NVIDIA_DISABLE_REQUIRE=true:
docker run --rm --name dallemini --gpus all -it -p 8888:8888 -v "${PWD}":/workspace -e NVIDIA_DISABLE_REQUIRE=true dalle-mini:latest

I ran into this issue and this work around worked. Thank you @elezar

mirekphd · 2023-01-07T09:25:51Z

Sorry, but I'm not at all convinced NVIDIA_DISABLE_REQUIRE should be used. The container will start, true, but ML algos will fail to train the model later on (if they are properly directed to use the GPU, without automatic failover to the CPU). CUDA versions on the host and in the container must be in sync in my experience, just like glibc versions. IOW, CUDA Minor Versions Compatibility (as described in the docs here) is a bit of wishful thinking...

The most precise error message resulting from the use of NVIDIA_DISABLE_REQUIRE is given by Catboost:

CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 803: system has unsupported display driver / cuda driver combination

feynmanliang mentioned this issue Feb 11, 2021

WSL2 CUDA Driver 465.42 not working with Nvidia’s CUDA 11.1 and higher NVIDIA/nvidia-docker#1458

Closed

9 tasks

thearperson mentioned this issue Apr 14, 2021

Cannot use GPU support in Windows 10 preview build 21354 microsoft/WSL#6773

Closed

onomatopellan mentioned this issue Apr 24, 2021

Unable to run GPU container on Windows Insider docker/for-win#11024

Closed

3 tasks

This was referenced Apr 26, 2021

docker desktop win10 docker/for-win#10587

Closed

Running nvidia-smi crashes with WSL with gpu support docker/for-win#10524

Closed

krfricke mentioned this issue Jun 8, 2022

[docker] Open source GPU docker images fail to start with cuda error ray-project/ray#25588

Closed

younesbelkada mentioned this issue Nov 1, 2023

[tests] Update Dockerfile to use cuda 12.2 huggingface/peft#1050

Merged

elezar transferred this issue from NVIDIA/nvidia-docker Nov 14, 2023

StrikerRUS mentioned this issue Jul 13, 2024

[ci] Update CUDA versions for CI microsoft/LightGBM#6539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia-container-cli reports incorrect CUDA driver version on WSL2 #148

nvidia-container-cli reports incorrect CUDA driver version on WSL2 #148

danfairs commented Nov 8, 2020

opptimus commented Nov 12, 2020 •

edited

Loading

klueska commented Nov 12, 2020

danfairs commented Nov 12, 2020

opptimus commented Nov 20, 2020

elezar commented Feb 12, 2021

archee8 commented May 4, 2021 •

edited

Loading

elezar commented May 5, 2021

archee8 commented May 5, 2021

klueska commented May 5, 2021

Keiku commented Mar 24, 2022

andresgalaviz commented Apr 1, 2022

psychofisch commented Jun 13, 2022

elezar commented Jun 14, 2022

TheFrator commented Sep 21, 2022

mirekphd commented Jan 7, 2023 •

edited

Loading

nvidia-container-cli reports incorrect CUDA driver version on WSL2 #148

nvidia-container-cli reports incorrect CUDA driver version on WSL2 #148

Comments

danfairs commented Nov 8, 2020

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

opptimus commented Nov 12, 2020 • edited Loading

The same with me

klueska commented Nov 12, 2020

danfairs commented Nov 12, 2020

opptimus commented Nov 20, 2020

elezar commented Feb 12, 2021

archee8 commented May 4, 2021 • edited Loading

elezar commented May 5, 2021

archee8 commented May 5, 2021

klueska commented May 5, 2021

Keiku commented Mar 24, 2022

andresgalaviz commented Apr 1, 2022

psychofisch commented Jun 13, 2022

elezar commented Jun 14, 2022

TheFrator commented Sep 21, 2022

mirekphd commented Jan 7, 2023 • edited Loading

opptimus commented Nov 12, 2020 •

edited

Loading

archee8 commented May 4, 2021 •

edited

Loading

mirekphd commented Jan 7, 2023 •

edited

Loading