Skip to content

Raise Error "no CUDA-capable device is detected" when running a CUBLAS compiled version of llama-cpp-python #880

Open
@bonuschild

Description

@bonuschild

Env

  • WSL 2
  • Nvidia driver installed
  • CUDA support installed by pip install torch torchvison torchaudio, which will install nvidia-cuda-xxx as well.
  • llama-cpp-python build command: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir

Problem Reproduce

Execute python -m llama_cpp.server --model yarn-mistral-7b-128k.Q5_K_M.gguf with error:

CUDA error 100 at /tmp/pip-install-hjlvezud/llama-cpp-python_b986d017976f49d0bf4e93e3963398af/vendor/llama.cpp/ggml-cuda.cu:5823: no CUDA-capable device is detected
current device: 0

nvidia-smi output

Mon Nov  6 23:21:17 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 536.23       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:07:00.0  On |                  N/A |
| 30%   40C    P8              13W / 170W |   1860MiB / 12288MiB |     11%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

CUDA verifies

with torch:

Python 3.8.18 (default, Sep 11 2023, 13:40:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.get_device_properties(0)
Out[3]: _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060', major=8, minor=6, total_memory=12287MB, multi_processor_count=28)

In [4]:

with pip:

# pip list | grep cublas
nvidia-cublas-cu12       12.1.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions