Open
Description
i use dockfile of openblas_simple , i build the docker image llamapython-cpu ,and run it
command is :
sudo docker run --rm -it -p 8000:8000 -v /home/cd/ai/baichuan/baichuan-ggml:/models -e MODEL=/models/ggml-model-q4_0.bin llamapython-cpu
i use postman to post a request, it can run fine, but when send a message token>50, it will throw error and stop:
I have 64g of ram, I confirmed that I have enough ram, so it's definitely not a problem of insufficient memory, and when I don't use openblas, there is no such problem.
i don't know why, thanks for your good job!