Description
Hello, I am completly newbie, when it comes to the subject of llms
I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I don't have any possibility to change it (offload some layers to GPU), even pasting in webui line "--n-gpu-layers 10" dont work. So I stareted searching, one of answers is command:
pip uninstall -y llama-cpp-python
set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
But that dont work for me. I got after paste it:
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
And it completly broke llama folder.. It uninstall it, and did nothing more. I need to update webui to fix and download llama.cpp again, cause I don't have any other possibility to download it.
I try also downloading compilation method, but that did.t work also. When i paste CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python in CMD/ CMD Windows in oogabooga, a I always got this message:
'CMAKE_ARGS' is not recognized as an internal or external command,
operable program or batch file.
or
'FORCE_CMAKE' is not recognized as an internal or external command,
operable program or batch file.
Same for command "make" it unrecognised it despite I have istalled make and Cmake
also, when i lanuch webui and choose ggml model, I got something like this in console:
lama model load internal: format ggjt v3 (latest)
lama model load internal: n_voc = 32001
lama model load internal: n_ctx = 2048
lama model load internal: n_embd = 6656
lama model load internal: n mult = 256
lama model load internal: n head = 52
lama model load internal: n_layer = 60
lama model load internal: n_rot = 128
lama model load internal: freq_base = 10000.0
lama model load internal: freq_scale = 1
lama model load internal: ftype = 2 (mostly Q4_0)
lama model load internal: n_ff = 17920
lama model load internal: model size = 30B
lama model_load internal: ggml ctx size = 0.14 MB
lama_model_load internal: mem required = 19712.68 MB 1+ 3124.00 MB per state)
lama_new_context with model: kv self size = 3120.00 MB
AVX=1 | AVX2=1 | AVX512=0 | AVX512_VBMI=0 | AVX512_VNNII=0 | FMA=1 | NEON=0 | ARM_FMA=0 | F16C=1 | FP16_VA=0 | - a WASM_SIMD=0 | BLAS=0 | SSE3=1 1 | VSX=0 |
2023.07.19 23:05:22 INFO:Loaded the model in 8.17 Seconds.
I am using windows and nvidia card
Easy solution to enable GPU offlading layers, that dont reqiure installing a ton of stuffs?