Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model #493

wingenlit · 2024-07-19T17:55:13Z

Contact Details

No response

What happened?

Hi there, I have just attempted to run the new Mistral-Nemo with llamafile on a gguf file quantized with llama.cpp b3405. It failed with error in unknown pre-tokenizer type: 'mistral-bpe' (logs shown below). Is there a replacement string type to use with --override-kv tokenizer.ggml.pre=str:{some_tokenizer_type_here} or I should just wait for the future versions?

./llamafile-0.8.9 --cli -m /mnt/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf --temp 0.2 -p "write something here:" -ngl 999 --no-display-prompt

thanks in advance.

Version

llamafile v0.8.9

What operating system are you seeing the problem on?

Linux, Windows

Relevant log output

llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'mistral-bpe'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf'

The text was updated successfully, but these errors were encountered:

raymondllu · 2024-07-22T06:16:34Z

Got the exactly same issue when loading Mistral-Nemo-2407 model using LMStudio that is also based on Llama.cpp. Waiting for the fix!

I don't know if it's a relevant issue reported in ggerganov/llama.cpp#8577, btw.

jart · 2024-07-22T07:18:53Z

We're excited about Nemo too. Once support is implemented upstream, we naturally intend to find a way to incorporate it here.

wingenlit · 2024-07-23T00:15:54Z

UPDATE

Llama.cpp had added support on mistral-nemo at version b3436 onwards. Therefore, llamafile will be updated soon.

For information only, as a result some earlier gguf checkpoints using fork version of llama.cpp might not work with latest llama.cpp. The version of gguf I am using thanks to bartowski is tested working. Repo from others might be updated to work soon.
p.s. the default context size for mistral-nemo is huge at 128k; tricked me thinking a leak of memory happened at the first time. Advised to use the context size flag with a smaller --ctx-size 10000 to start with, and then pull up until vram is adequately used.

jart · 2024-07-23T01:15:13Z

I can't cherry-pick ggerganov/llama.cpp#50e05353e88d50b644688caa91f5955e8bdb9eb9 because the code it touches has had a considerable amount of churn upstream recently. It'll have to wait until the next full synchronization with upstream. Right now I'm focused primarily on developing a new server. Contributions are welcome on backporting Nemo support. I know this feature is important too so @stlhood should probably chime in on where our priorities should be. Upstream has also been making problematic changes to ggml-cuda lately that prevent us from using it the way it's written, since upstream refused our request to add #ifdef statements that would make sync simpler by disabling features that significantly increase code size.

wingenlit · 2024-07-23T02:24:01Z

sorry about just closing the issue without the inside knowledge. will wait for the problem being resolved.

wingenlit added bug medium severity labels Jul 19, 2024

jart added enhancement and removed bug labels Jul 19, 2024

wingenlit closed this as completed Jul 23, 2024

jart reopened this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model #493

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model #493

wingenlit commented Jul 19, 2024

raymondllu commented Jul 22, 2024

jart commented Jul 22, 2024

wingenlit commented Jul 23, 2024

jart commented Jul 23, 2024

wingenlit commented Jul 23, 2024

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model #493

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model #493

Comments

wingenlit commented Jul 19, 2024

Contact Details

What happened?

Version

What operating system are you seeing the problem on?

Relevant log output

raymondllu commented Jul 22, 2024

jart commented Jul 22, 2024

wingenlit commented Jul 23, 2024

jart commented Jul 23, 2024

wingenlit commented Jul 23, 2024