Skip to content

LLama cpp problem ( gpu support) #509

Open
@xajanix

Description

@xajanix

Hello, I am completly newbie, when it comes to the subject of llms
I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I don't have any possibility to change it (offload some layers to GPU), even pasting in webui line "--n-gpu-layers 10" dont work. So I stareted searching, one of answers is command:

pip uninstall -y llama-cpp-python
set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir

But that dont work for me. I got after paste it:

 [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

And it completly broke llama folder.. It uninstall it, and did nothing more. I need to update webui to fix and download llama.cpp again, cause I don't have any other possibility to download it.

I try also downloading compilation method, but that did.t work also. When i paste CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python in CMD/ CMD Windows in oogabooga, a I always got this message:

'CMAKE_ARGS' is not recognized as an internal or external command,
operable program or batch file.

or

'FORCE_CMAKE' is not recognized as an internal or external command,
operable program or batch file.

Same for command "make" it unrecognised it despite I have istalled make and Cmake

also, when i lanuch webui and choose ggml model, I got something like this in console:

lama model load internal: format ggjt v3 (latest) 
lama model load internal: n_voc = 32001 
lama model load internal: n_ctx = 2048 
lama model load internal: n_embd = 6656 
lama model load internal: n mult = 256 
lama model load internal: n head = 52 
lama model load internal: n_layer = 60 
lama model load internal: n_rot = 128 
lama model load internal: freq_base = 10000.0 
lama model load internal: freq_scale = 1 
lama model load internal: ftype = 2 (mostly Q4_0) 
lama model load internal: n_ff = 17920 
lama model load internal: model size = 30B 
lama model_load internal: ggml ctx size = 0.14 MB 
lama_model_load internal: mem required = 19712.68 MB 1+ 3124.00 MB per state) 
lama_new_context with model: kv self size = 3120.00 MB
AVX=1 | AVX2=1 | AVX512=0 | AVX512_VBMI=0 | AVX512_VNNII=0 | FMA=1 | NEON=0 | ARM_FMA=0 | F16C=1 | FP16_VA=0 | - a WASM_SIMD=0 | BLAS=0 | SSE3=1 1 | VSX=0 |
2023.07.19 23:05:22 INFO:Loaded the model in 8.17 Seconds. 

I am using windows and nvidia card

Easy solution to enable GPU offlading layers, that dont reqiure installing a ton of stuffs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghardwareHardware specific issuellama.cppProblem with llama.cpp shared lib

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions