Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Migrate to OpenAI SDK v1.0.0, using the guide and CLI tool here.
I manually tested this by sending requests and checking that the
RequestResult
looked acceptable for the following models:gpt-3.5-turbo-instruct
(completion),gpt-3.5-turbo-0613
(chat),text-embedding-ada-002
(embedding), anddall-e-2
(image).The only behavior change is that we now use "model" instead of "engine" in the request to completion and embedding models, because "engine" is deprecated. This means that new requests will not match existing cache keys for completion and embedding models. Chat models were already using "model" and are unaffected.
Also fixes an existing issue where
ModerationAPIClient
is instantiated eagerly instead of lazily inServerService
.Also add a configurable
base_url
, which should allow the client to be used with vLLM.Addresses #1997.