Open
Description
I am running this code:
%%capture
!pip install huggingface_hub
#!pip install langchain
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
# GPU
from llama_cpp import Llama
lcpp_llm = None
lcpp_llm = Llama(
model_path=model_path,
n_threads=2, # CPU cores
n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_gpu_layers=32, # Change this value based on your model and your GPU VRAM pool.
n_ctx=1500
)
but i get this error:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-7-a3e79fc7c065>](https://localhost:8080/#) in <cell line: 4>()
2 from llama_cpp import Llama
3 #lcpp_llm = None
----> 4 lcpp_llm = Llama(
5 model_path=model_path,
6 n_threads=2, # CPU cores
[/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py](https://localhost:8080/#) in __init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, n_gqa, rms_norm_eps, mul_mat_q, verbose)
321 self.model_path.encode("utf-8"), self.params
322 )
--> 323 assert self.model is not None
324
325 if verbose:
AssertionError:
I didnt face this problem 2 days ago. How could I fix it?