LLama cpp problem ( gpu support)

Hello, I am completly newbie, when it comes to the subject of llms
I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I don't have any possibility to change it (offload some layers to GPU), even pasting in webui line "--n-gpu-layers 10" dont work. So I stareted searching, one of answers is command:
```
pip uninstall -y llama-cpp-python
set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
```
But that dont work for me. I got after paste it:

```
 [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
```
And it completly broke llama folder.. It uninstall it, and did nothing more. I need to update webui to fix and download llama.cpp again, cause I don't have any other possibility to download it.

I try also downloading compilation method, but that did.t work also. When i paste CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python in CMD/ CMD Windows in oogabooga, a I always got this message:
```
'CMAKE_ARGS' is not recognized as an internal or external command,
operable program or batch file.
```

or 

```
'FORCE_CMAKE' is not recognized as an internal or external command,
operable program or batch file.
```
Same for command "make" it unrecognised it despite I have istalled make and Cmake

also, when i lanuch webui and choose ggml model, I got something like this in console:
```
lama model load internal: format ggjt v3 (latest) 
lama model load internal: n_voc = 32001 
lama model load internal: n_ctx = 2048 
lama model load internal: n_embd = 6656 
lama model load internal: n mult = 256 
lama model load internal: n head = 52 
lama model load internal: n_layer = 60 
lama model load internal: n_rot = 128 
lama model load internal: freq_base = 10000.0 
lama model load internal: freq_scale = 1 
lama model load internal: ftype = 2 (mostly Q4_0) 
lama model load internal: n_ff = 17920 
lama model load internal: model size = 30B 
lama model_load internal: ggml ctx size = 0.14 MB 
lama_model_load internal: mem required = 19712.68 MB 1+ 3124.00 MB per state) 
lama_new_context with model: kv self size = 3120.00 MB
AVX=1 | AVX2=1 | AVX512=0 | AVX512_VBMI=0 | AVX512_VNNII=0 | FMA=1 | NEON=0 | ARM_FMA=0 | F16C=1 | FP16_VA=0 | - a WASM_SIMD=0 | BLAS=0 | SSE3=1 1 | VSX=0 |
2023.07.19 23:05:22 INFO:Loaded the model in 8.17 Seconds. 
```
I am using windows and nvidia card


Easy solution to enable GPU offlading layers, that dont reqiure installing a ton of stuffs?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLama cpp problem ( gpu support) #509

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LLama cpp problem ( gpu support) #509

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions