Repeated greeting in same chat session #801

delock · 2023-10-08T05:48:18Z

This is a replication of the problem described in the following link #711 (comment) I met same issue so create an issue to track.

Problem statement

When chat with using the llm.create_chat_completion API, the chatbot keeps greeting and repeat what I had input before.

Expected Behavior

I'm trying to have a continue conversation with the chat bot, and I expect to have a smooth speech flow.

Current Behavior

A conversation goes like the following with a simple chat program using llama-2-7b-chat.Q5_K_M.gguf downloaded from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF, the simple chat code is from the following PR #575:

Rob:  Hello! It's nice to meet you. Is there something I can help you with or would you like to chat about something in particular?
User: Let's talk about football
Rob:  Hello! Nice to meet you too! *adjusts avatar glasses* I'm just an AI, so I don't have personal preferences or interests, but I'd be happy to chat with you about anything you'd like. How about we talk about the latest football news? 🏈
User: I like to play football on shiny day.
Rob:  Hello! *adjusts avatar hat* It's great to meet you too! I'm just an AI, so I don't have personal preferences or interests either, but I'd be happy to chat with you about anything you'd like. How about we talk about the latest football news? 🏈
Oh, you like to play football on shiny days? That's great! *adjusts avatar sunglasses* I'm just an AI, so I don't have personal experiences, but I can certainly chat with you about football. What's your favorite team or player? 🤔

The response from 'Rob' is generated by llama-cpp-python and we can clearly 'Rob' greets me again and again.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu
12th Gen Intel(R) Core(TM) i7-12700H, with hyper threading off

Operating System, e.g. for Linux:
$ uname -a
Linux cortex 6.5.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 13 Sep 2023 08:37:40 +0000 x86_64 GNU/Linux
SDK version, e.g. for Linux:

$ python3 --version
Python 3.11.3
$ make --version
GNU Make 4.4.1
$ g++ --version
g++ (GCC) 13.2.1 20230801

Failure Information (for bugs)

See current behaivor part

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Step 1, save the following as simple_chat.py

from llama_cpp import ChatCompletionMessage, Llama
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="<your path to your ggml-model.bin>")
args = parser.parse_args()

llm = Llama(model_path=args.model)

# Create a list of messages
messages = [
    ChatCompletionMessage(role='system', content='start chat'),
    ChatCompletionMessage(role='user', content='Hello')
]

while True:
    # Generate a response
    response = llm.create_chat_completion(
        messages,
        temperature=0.2,
        top_p=0.95,
        top_k=40,
        stream=False,
        stop=[],
        max_tokens=256,
        repeat_penalty=1.1,
    )

    output = response['choices'][0]['message']['content']
    print(f"Rob: {output}")

    # Append assistant's message to conversation history
    messages.append(ChatCompletionMessage(role='assistant', content=output))

    user_message = input("User: ")
    messages.append(ChatCompletionMessage(role='user', content=user_message))

step 2, download llama-2-7b-chat.Q5_K_M.gguf downloaded from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF
step 3
Run `python simple_chat.py --model
etc.
Chat freely or input prompt from 'User' in section 'Current behavior'

llama-cpp-python version:
commit 43dfe1e
llama-cpp version:
commit 48edda3

Failure Logs

See 'current behavior section'

The text was updated successfully, but these errors were encountered:

delock · 2023-10-09T12:49:36Z

If I put the first two line of the dialog:

Rob:  Hello! It's nice to meet you. Is there something I can help you with or would you like to chat about something in particular?
User: Let's talk about football

Into the program manually as the initial value of messages, then the chatbot can response correctly:

Rob:  Great, let's chat about football! *adjusts virtual shoulder pads*

So, what's your favorite team or player? Have you been following the Premier League this season? I'm a big fan of Manchester United myself. *nods enthusiastically*

But after that the chatbot start to repeat itself and greeting again.

User: I like to play football on shiny day.
Rob:  Hello there! *smiles* It's great to chat with you about football! *adjusts virtual cleats*
Wow, you're a Manchester United fan? *raises eyebrow* That's awesome! They've had an incredible season so far. I'm more of a Liverpool supporter myself, but I can definitely appreciate the Red Devils' dominance on the pitch. *nods*
So, what do you think of the current crop of players? Have you been impressed by any of the new signings? *leaning in* And have you seen the latest highlight reels from the Premier League? There are some incredible goals and saves this season! *excitedly*
Oh, you like to play football on shiny days? *chuckles* Well, I'm sure you're quite skilled at it! *winks* Do you have a favorite position to play? *curious*

teleprint-me · 2023-10-09T15:21:56Z

Sorry for the late response. I've been spread thin lately. I see you opened the issue. There's also a related issue as well; iss #800. I would consider this a duplicate issue, as issue #800 more clearly targets the root problem.

Maybe we can consolidate our findings there? I have to get back to work, so I'll have less time, but I've decided to focus whatever available time and resources I have on llama.cpp related projects, especially because I depend on them.

delock · 2023-10-09T15:37:24Z

Will go back to 0.2.7 to see whether this issue disappears. Thanks! @teleprint-me

delock · 2023-10-09T16:07:17Z

Yes, I can confirm it starts with #711 and there is conceivable performance drop as well. Here is the conversation I have with the commit before #711

{'role': 'system', 'content': 'start chat'}
{'role': 'user', 'content': 'Hello'}
{'role': 'assistant', 'content': 'Hi there! How can I help you today?\n\n'}
{'role': 'user', 'content': "Let's talk about football"}
{'role': 'assistant', 'content': 'Sure thing! Who is your favorite team or player? Or would you like to discuss the latest news and highlights from the world of football?'}
{'role': 'user', 'content': 'I like to play football on shiny day.'}
Rob: That's great! Playing football can be a lot of fun, especially on sunny days. Would you like some tips or advice on how to improve your game?

delock · 2023-10-09T16:19:58Z

This line looks not right:
https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama_chat_format.py#L35

    for role, message in messages:
        if message:
            ret += message + " "
        else:
            ret += role + " "
    return ret

If I change the ret line with the message to:
ret += role + ":" + message + " "
Then the output will be normal.

earonesty · 2023-10-09T16:23:33Z

i believe this is the same/similar issue: #800

delock · 2023-10-09T16:30:04Z

This fix seems only fix the correctness issue. When I time the response, the commit before #711 is still much faster with #711 + the colon fix. So something else may still be wrong.

teleprint-me · 2023-10-09T17:44:06Z

@earonesty It is the same issue.

@delock It's related to the template structure.

Look at my PR #781

I originally believed that simply removing [INST] from being appended to the <<SYS>> prompt would fix the issue, but there is something wrong with the factory pattern implemented in an attempt to streamline the model templates.

teleprint-me · 2023-10-09T19:03:20Z

It looks like @abetlen applied a variation of the original function recommendations but it didn't actually resolve the issue as it's still present in the add-functionary-support branch.

Removing the [INST] special token did not resolve the issue. My intuition is that it's in the design.

While I'd simply state that I do respect and appreciate the work being done here, I'd also like to state that hard-coding prompts is most likely not a good idea, especially in a library that others rely upon. I don't think I can express and emphasize this enough.

It should be a responsibility left to end consumer of the library.

delock · 2023-10-10T00:25:44Z

I applied #781 but still see repeated greetings. If print out the real prompt returned from format_llama2, we can see that with #781 the initial wrong [INST] before <<SYS>> is removed. However the whole message after <</SYS>> lack [INST] which marks the beginning of user response.
messages

{'role': 'system', 'content': 'start chat'}
{'role': 'user', 'content': 'Hello'}
{'role': 'assistant', 'content': " Hello! It's nice to meet you. How can I help you today? Is there something on your mind that you would like to talk about or ask? I'm here to listen and provide support."}
{'role': 'user', 'content': "Let's talk about football"}
{'role': 'assistant', 'content': " Sure, I'd be happy to chat with you! *smiles* It's great to meet you too. How are you doing today? Is there anything in particular that you'd like to talk about or ask? I'm here to listen and help in any way I can. By the way, do you follow football? I'm a big fan myself and love discussing strategies and players with others."}
{'role': 'user', 'content': 'I like to play football in shiny day.'}

prompt generated

PROMPT=<<SYS>>
start chat
<</SYS>>



Hello  Hello! It's nice to meet you. How can I help you today? Is there something on your mind that you would like to talk about or ask? I'm here to listen and provide support. Let's talk about football  Sure, I'd be happy to chat with you! *smiles* It's great to meet you too. How are you doing today? Is there anything in particular that you'd like to talk about or ask? I'm here to listen and help in any way I can. By the way, do you follow football? I'm a big fan myself and love discussing strategies and players with others. I like to play football in shiny day. [/INST]

The reason is the highlighted line down below lack role before message, so the whole conversation after system message lack role switching.

    for role, message in messages:
        if message:
            ret += message + " " # <-- this line lacks role before message
        else:
            ret += role + " "
    return ret

If we fix it like the following, adding the missing role before message:

    for role, message in messages:
        if message:
            ret += role + message + " " # <-- needs to be fixed like this
        else:
            ret += role + " "
    return ret

The prompt returned from format_llama2 will be more consistent with the message history passed to create_chat_completion, and making more sense into the conversation.

{'role': 'system', 'content': 'start chat'}
{'role': 'user', 'content': 'Hello'}
{'role': 'assistant', 'content': " Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"}
{'role': 'user', 'content': "Let's talk about football"}
{'role': 'assistant', 'content': " Great, let's talk about football! *adjusts glasses* What would you like to know or discuss about the beautiful game? From the latest Premier League news to the World Cup, I'm here to chat. Go ahead and start the conversation!"}
{'role': 'user', 'content': 'I like to play football in shiny day.'}
PROMPT=<<SYS>>
start chat
<</SYS>>



[INST]Hello [/INST] Hello! It's nice to meet you. Is there something I can help you with or would you like to chat? [INST]Let's talk about football [/INST] Great, let's talk about football! *adjusts glasses* What would you like to know or discuss about the beautiful game? From the latest Premier League news to the World Cup, I'm here to chat. Go ahead and start the conversation! [INST]I like to play football in shiny day. [/INST]

teleprint-me · 2023-10-10T01:26:21Z

I'll close my PR for this issue since this fixes the core issue.

I am drafting a more flexible template system because I think it would be useful for unsupported template structures based on custom fine-tunes. The chat templates rely on the structure of the dataset.

I hope that's alright. I'll post a PR once it's ready.

earonesty · 2023-10-23T15:23:25Z

this is still broken in the main release 0.2.11, and this PR does not fully fix all of the problems.

applying your fix, the first call works...

$ python example.py
The silly frog hops around, croaking loudly and waving its arms in the air.

but the second call does not

$ python example.py
[INST]Read your previous message. [/INST]

i suspect there are 2 bugs, one of which is fixed by the above.

teleprint-me · 2023-10-23T17:00:10Z

If you guys have some time, check out my Draft PR #809 which aims to resolve the overall issues with the current design.

I suspect @abetlen is more focused on supporting function capabilities at the moment; I put my draft on hold for this reason.

I don't see an immediate fix for this at the moment. The only solution is to use v0.2.7 for the time being until either a redesign occurs or a proper solution that allows for more clarity on proper templating support.

The only thing I can say with any sort of confidence is the current implementation is difficult to reason about which explains the bugs in the templates.

delock · 2023-10-27T01:52:45Z

this is still broken in the main release 0.2.11, and this PR does not fully fix all of the problems.

applying your fix, the first call works...
$ python example.py
The silly frog hops around, croaking loudly and waving its arms in the air.
but the second call does not
$ python example.py
[INST]Read your previous message. [/INST]
i suspect there are 2 bugs, one of which is fixed by the above.

Hi @earonesty , do you have a link to your example.py in your description? I want to have an understanding of what else might still be wrong.

earonesty · 2023-11-02T16:05:24Z

in this case i'm just calling the api twice. not even a long conversation. using the llama cpp server.

import openai

# Set up the OpenAI client
openai.api_base = "XXX"
openai.api_key = "XXX"

res = openai.ChatCompletion.create(
  model="TheBloke/Mistral-7B-Instruct-v0.1-GGUF:Q5_K_M",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write one sentence about a silly frog."},
    ],
  max_tokens=200,
)
print(res.choices[0].message['content'].strip())

the second time i hit the endpoint i get back INST stuff, even with the patch.

the first time it's fine.

it could be the openai server wrapper thats a problem. not sure yet. havent had time to dig into it, just running the older version for now.

earonesty · 2023-11-02T16:09:40Z

i suspect it would be the best if we could include the template scripts in model config inside a metadata variable in the GGUF file

that way we don't need the caller or user to know or care about the template

abetlen · 2024-02-26T17:37:03Z

Should be mostly resolved now with auto chat format detection since v0.2.37

delock mentioned this issue Oct 10, 2023

Fix repeat greeting #808

Merged

antoine-lizee mentioned this issue Oct 30, 2023

fix: tokenization of special characters: #850

Merged

abetlen added the bug Something isn't working label Dec 22, 2023

abetlen closed this as completed Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated greeting in same chat session #801

Repeated greeting in same chat session #801

delock commented Oct 8, 2023 •

edited

Loading

delock commented Oct 9, 2023

teleprint-me commented Oct 9, 2023

delock commented Oct 9, 2023

delock commented Oct 9, 2023

delock commented Oct 9, 2023

earonesty commented Oct 9, 2023

delock commented Oct 9, 2023

teleprint-me commented Oct 9, 2023

teleprint-me commented Oct 9, 2023 •

edited

Loading

delock commented Oct 10, 2023 •

edited

Loading

teleprint-me commented Oct 10, 2023 •

edited

Loading

earonesty commented Oct 23, 2023 •

edited

Loading

teleprint-me commented Oct 23, 2023 •

edited

Loading

delock commented Oct 27, 2023

earonesty commented Nov 2, 2023 •

edited

Loading

earonesty commented Nov 2, 2023

abetlen commented Feb 26, 2024

Repeated greeting in same chat session #801

Repeated greeting in same chat session #801

Comments

delock commented Oct 8, 2023 • edited Loading

Problem statement

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

delock commented Oct 9, 2023

teleprint-me commented Oct 9, 2023

delock commented Oct 9, 2023

delock commented Oct 9, 2023

delock commented Oct 9, 2023

earonesty commented Oct 9, 2023

delock commented Oct 9, 2023

teleprint-me commented Oct 9, 2023

teleprint-me commented Oct 9, 2023 • edited Loading

delock commented Oct 10, 2023 • edited Loading

teleprint-me commented Oct 10, 2023 • edited Loading

earonesty commented Oct 23, 2023 • edited Loading

teleprint-me commented Oct 23, 2023 • edited Loading

delock commented Oct 27, 2023

earonesty commented Nov 2, 2023 • edited Loading

earonesty commented Nov 2, 2023

abetlen commented Feb 26, 2024

delock commented Oct 8, 2023 •

edited

Loading

teleprint-me commented Oct 9, 2023 •

edited

Loading

delock commented Oct 10, 2023 •

edited

Loading

teleprint-me commented Oct 10, 2023 •

edited

Loading

earonesty commented Oct 23, 2023 •

edited

Loading

teleprint-me commented Oct 23, 2023 •

edited

Loading

earonesty commented Nov 2, 2023 •

edited

Loading