Raise Error "no CUDA-capable device is detected" when running a `CUBLAS` compiled version of llama-cpp-python

## Env
- WSL 2
- Nvidia driver installed
- CUDA support installed by `pip install torch torchvison torchaudio`, which will install `nvidia-cuda-xxx` as well.
- llama-cpp-python build command: ` CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir`

## Problem Reproduce

Execute `python -m llama_cpp.server --model yarn-mistral-7b-128k.Q5_K_M.gguf` with error:
```
CUDA error 100 at /tmp/pip-install-hjlvezud/llama-cpp-python_b986d017976f49d0bf4e93e3963398af/vendor/llama.cpp/ggml-cuda.cu:5823: no CUDA-capable device is detected
current device: 0
```
## `nvidia-smi` output
```
Mon Nov  6 23:21:17 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 536.23       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:07:00.0  On |                  N/A |
| 30%   40C    P8              13W / 170W |   1860MiB / 12288MiB |     11%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
```
## CUDA verifies
with torch:
```bash
Python 3.8.18 (default, Sep 11 2023, 13:40:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.get_device_properties(0)
Out[3]: _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060', major=8, minor=6, total_memory=12287MB, multi_processor_count=28)

In [4]:
```
with pip:
```bash
# pip list | grep cublas
nvidia-cublas-cu12       12.1.3.1
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raise Error "no CUDA-capable device is detected" when running a `CUBLAS` compiled version of llama-cpp-python #880

Env

Problem Reproduce

`nvidia-smi` output

CUDA verifies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Raise Error "no CUDA-capable device is detected" when running a CUBLAS compiled version of llama-cpp-python #880

Description

Env

Problem Reproduce

nvidia-smi output

CUDA verifies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Raise Error "no CUDA-capable device is detected" when running a `CUBLAS` compiled version of llama-cpp-python #880

`nvidia-smi` output