Enable streaming option in the OpenAI API server #480

adk9 · 2024-05-16T18:54:25Z

Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint.

This PR

adds missing package dependencies for OpenAI API server (fixes Is openai compatible server still working? #459)
re-enables streaming responses for the OpenAPI API endpoint

Running the Server

python -m mii.entrypoints.openai_api_server \
    --model "mistralai/Mistral-7B-Instruct-v0.1" \
    --port 3000 \
    --host 0.0.0.0

Client

from openai import OpenAI

client = OpenAI(
    base_url="http://ip:port/v1",
    api_key="test",
)

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-v0.1",
    messages=[
        {
            "role": "user",
            "content": "Tell me a joke.",
        },
    ],
    max_tokens=1024,
    stream=True
)

for chunk in completion:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

requirements/requirements.txt

.github/workflows/nv-a6000-fastgen.yml

adk9 added 2 commits May 16, 2024 18:33

Add required deps for OAI API server

42c18d0

Enable streaming for OpenAI API server

98fbe88

adk9 requested review from mrwyattii and awan-10 as code owners May 16, 2024 18:54

Formatting fixes

756cec6

loadams reviewed May 16, 2024

View reviewed changes

requirements/requirements.txt Outdated Show resolved Hide resolved

loadams mentioned this pull request May 31, 2024

Some fixes to make openai entrypoint work out of the box #485

Closed

loadams and others added 3 commits August 28, 2024 13:26

Merge branch 'main' into adk9/oai-streaming

630ae7c

Pin transformers as it is impacting unit tests

9450c8a

Test with no depth set

805a9bc

loadams reviewed Aug 28, 2024

View reviewed changes

.github/workflows/nv-a6000-fastgen.yml Outdated Show resolved Hide resolved

Undo pinning transformers

6fc3986

loadams approved these changes Sep 3, 2024

View reviewed changes

loadams merged commit 3ed3aa2 into main Sep 3, 2024
4 checks passed

loadams deleted the adk9/oai-streaming branch September 3, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable streaming option in the OpenAI API server #480

Enable streaming option in the OpenAI API server #480

adk9 commented May 16, 2024

Enable streaming option in the OpenAI API server #480

Enable streaming option in the OpenAI API server #480

Conversation

adk9 commented May 16, 2024

Running the Server

Client