Protect against errors raised when adding a request to the engine #230

masahi · 2024-03-14T21:10:50Z

Previously, when an exception is raised at get_new_request_state(...) due to a malformed request, the server just dies.

An example of such error

  File "/home/masahi/projects/dev/mlc-llm/serve/mlc_serve/engine/staging_engine.py", line 122, in add                              
    state = get_new_request_state(
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/masahi/projects/dev/mlc-llm/serve/mlc_serve/engine/engine_common.py", line 57, in get_new_request_state
    prompt = conversation_template.apply(request.messages)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/masahi/projects/dev/mlc-llm/serve/mlc_serve/model/tokenizer.py", line 41, in apply
    return self._tokenizer.apply_chat_template(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/masahi/miniconda3/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1745, in apply_chat_template
    rendered = compiled_template.render(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/masahi/miniconda3/lib/python3.11/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/home/masahi/miniconda3/lib/python3.11/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 1, in top-level template code
  File "/home/masahi/miniconda3/lib/python3.11/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/masahi/miniconda3/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1790, in raise_exception
    raise TemplateError(message)
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

And a repro script

import os
import openai

def jsonmode(endpoint, model, prompt, schema):
    client = openai.OpenAI(base_url=f"{endpoint}/v1", api_key="xxxxx")
    response_format={"type": "json_object", "schema": schema}
    chat_completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant.",
            },
            {
                "role": "user",
                "content": prompt,
            },
        ],
        temperature=0,
        max_tokens=512,
        response_format=response_format,
    )

    jsonstr = chat_completion.choices[0].message.content
    return jsonstr


schema_without_circular_dependency = {
    "$defs": {
        "Person": {"type": "object", "properties": {"name": {"type": "string"}}},
        "Family": {
            "type": "object",
            "properties": {
                "last_name": {"type": "string"},
                "members": {"type": "array", "items": {"$ref": "#/$defs/Person"}},
            },
        },
    },
}

endpoint = "http://localhost:9000"
model = "mistral-7b-instruct"

prompt = "The Doe family has members John and Jane"
winner = jsonmode(endpoint, model, prompt, schema_without_circular_dependency)
print(winner)

@yelite @elvin-n

masahi · 2024-03-14T21:13:40Z

serve/mlc_serve/engine/staging_engine.py

+                        RequestOutput(
+                            req_id,
+                            sequences=[],
+                            error=err_msg,


By returning a non-None error here, the exception is now raised at https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/async_connector.py#L88-L89 like

File "/home/masahi/projects/dev/mlc-llm/serve/mlc_serve/api/handler.py", line 159, in request_completion return await collect_result_stream( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/masahi/projects/dev/mlc-llm/serve/mlc_serve/api/handler.py", line 233, in collect_result_stream async for res in result_generator: File "/home/masahi/projects/dev/mlc-llm/serve/mlc_serve/engine/async_connector.py", line 89, in generate raise TextGenerationError(output.error) mlc_serve.engine.error.TextGenerationError: Conversation roles must alternate user/assistant/user/assistant/...

Using the standalone MLC server, a client still gets openai.InternalServerError: Internal Server Error as a response but the server doesn't die.

Would that be ok? I assume ollm has a proper response handling logic in such case @jroesch

Yes as long as we get an exception back we should catch and convert it properly!

yelite · 2024-03-15T03:16:36Z

serve/mlc_serve/engine/staging_engine.py

+                )
+                new_request_states.append(state)
+            except Exception as e:
+                LOG.warn("Failed to add a request", request_id=req.request_id)


Will it be better to just throw the error here, and catch it at https://github.com/octoml/ollm/blob/d29be36231e666f761a2fb08dbf0e4ce758618f4/mlc-serve/mlc_serve/engine/async_connector.py#L153? I think initially we just assumed engine.add cannot fail. Now it might be a good time to revisit this assumption. One caveat of the approach of deferring error reporting to the engine.step is that, streaming API can no longer returns error status code once streaming begins (because of how server-sent event works)

Just sharing some thoughts. No need to block this PR if it's related to production issue.

I like that for its simplicity, and it also removes the need for an additional lock. I reworked this PR according to your suggestion, but I'm not sure what to do after catching the error in async_connector.py. The following seems to work, but is this the right way?

try: await asyncio.to_thread(self.engine.add, [request]) except TextGenerationError as e: raise asyncio.CancelledError(e)

I don't think any change is needed in async_connector, but I am not fully sure. The TextGenerationError will just propagate to the http handler and the regular error handling can happen there. Because it's the engine.add that fails, we don't need to call engine.cancel either (as in https://github.com/octoml/ollm/blob/de6378ee6a1391276530e94b7b4374f01792c8ae/mlc-serve/mlc_serve/engine/async_connector.py#L99)

One benefit of throwing error in engine.add is that the http handler will be able to respond with failure http status code for streaming requests. Throwing a CancelledError will confuse the handler

Confirmed that not catching the error in async_connector.py still keeps the server from dying.

adstraw · 2024-03-15T17:08:38Z

Ready to merge?

masahi commented Mar 14, 2024

View reviewed changes

Protect against invalid request format

70e5918

masahi force-pushed the error-recover-json branch from d68566b to 70e5918 Compare March 14, 2024 21:23

masahi changed the title ~~Error recover json~~ Protect against errors raised when adding a request to the engine Mar 14, 2024

add warning when a request fails to be added

23a76a7

yelite approved these changes Mar 15, 2024

View reviewed changes

yelite reviewed Mar 15, 2024

View reviewed changes

masahi added 2 commits March 15, 2024 11:09

revert

3ddfbd8

alternative suggested by lite

ad482bf

revert async_connector change

39065c2

masahi merged commit fd4ea46 into octoml:batch-serving Mar 15, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect against errors raised when adding a request to the engine #230

Protect against errors raised when adding a request to the engine #230

masahi commented Mar 14, 2024 •

edited

Loading

masahi Mar 14, 2024 •

edited

Loading

jroesch Mar 15, 2024

yelite Mar 15, 2024

masahi Mar 15, 2024

yelite Mar 15, 2024 •

edited

Loading

yelite Mar 15, 2024 •

edited

Loading

masahi Mar 15, 2024

adstraw commented Mar 15, 2024

Protect against errors raised when adding a request to the engine #230

Protect against errors raised when adding a request to the engine #230

Conversation

masahi commented Mar 14, 2024 • edited Loading

masahi Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

jroesch Mar 15, 2024

Choose a reason for hiding this comment

yelite Mar 15, 2024

Choose a reason for hiding this comment

masahi Mar 15, 2024

Choose a reason for hiding this comment

yelite Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

yelite Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

masahi Mar 15, 2024

Choose a reason for hiding this comment

adstraw commented Mar 15, 2024

masahi commented Mar 14, 2024 •

edited

Loading

masahi Mar 14, 2024 •

edited

Loading

yelite Mar 15, 2024 •

edited

Loading

yelite Mar 15, 2024 •

edited

Loading