Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Structured Outputs #939

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

ivanleomk
Copy link
Collaborator

@ivanleomk ivanleomk commented Aug 20, 2024

I tried a more explicit title - but we can also do something like Announcing support for Structured Output


🚀 This description was created by Ellipsis for commit 1e49b66

Summary:

This PR adds a blog post comparing OpenAI's Structured Outputs with the instructor tool, addressing challenges and showcasing solutions with examples.

Key points:

  • Added blog post docs/blog/posts/introducing-structured-outputs.md on OpenAI's Structured Outputs.
  • Updated author avatar in docs/blog/.authors.yml.
  • Replaced Mode.STRUCTURED_OUTPUTS with Mode.TOOLS_STRICT in examples.
  • Discussed challenges: limited validation, streaming issues, latency spikes.
  • Included pydantic code examples for validation.
  • Benchmarked against instructor tool.
  • Highlighted instructor features: automatic validation, retries, real-time streaming, provider-agnostic API.
  • Demonstrated streaming and partial data extraction.

Generated with ❤️ by ellipsis.dev

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to e1eb60c in 11 seconds

More details
  • Looked at 489 lines of code in 2 files
  • Skipped 1 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/blog/posts/introducing-structured-outputs.md:4
  • Draft comment:
    The slug is-instructor-dead is misleading and unrelated to the content of the blog post. Consider changing it to something more relevant, like introducing-structured-outputs.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR introduces a new blog post, but the slug in the front matter is misleading and unrelated to the content.
2. docs/blog/posts/introducing-structured-outputs.md:1
  • Draft comment:
    Ensure this new markdown file is added to mkdocs.yml for proper documentation inclusion.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The new markdown file should be added to mkdocs.yml for documentation consistency.

Workflow ID: wflow_SrIsJn2iqFKVC6aJ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

cloudflare-workers-and-pages bot commented Aug 20, 2024

Deploying instructor with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1e49b66
Status: ✅  Deploy successful!
Preview URL: https://2336d6ba.instructor.pages.dev
Branch Preview URL: https://introducing-structured-outpu.instructor.pages.dev

View logs

@jxnl
Copy link
Owner

jxnl commented Aug 20, 2024

change the title to "should i be using structured outputs"

and answer the question directly in the intro


# Is Instructor Dead?

## What's Open AI's Structured Output mode all about?

OpenAI's new Structured Output mode is a huge step change for developers building complex workflows. Given an arbitrary JSON Schema, Structured Output ensures that the response matches the schema exactly.

Here's a basic example.

to something like

# Two challenges with OpenAI's Structured outputs

With guaranteed schema adherence, outputs always conform to your defined Pydantic model, eliminating type mismatches and missing fields. However, while Structured Outputs solve many common issues, two key challenges emerge when building more sophisticated applications

1. Limited capabilities for reasking validations 
2. Limited Capabilities for streaming structured data

## What is Structured Output mode?

OpenAI's new Structured Output mode is a huge step change for developers building complex workflows. Given an arbitrary JSON Schema, Structured Output ensures that the response matches the schema exactly.

Here's a basic example.

@jxnl
Copy link
Owner

jxnl commented Aug 20, 2024

but in your body you mention more than streaming and validation, so you need to forshadow more.

  1. validation: ...
  2. streaming: ...
  3. latency: ...

Don't assume someone is going to read it without knowing what they get out of the article.

@jxnl
Copy link
Owner

jxnl commented Aug 20, 2024

remove the citations section

#> name='Jason' age=25
```

With guaranteed schema adherence, outputs always conform to your defined Pydantic model, eliminating type mismatches and missing fields. However, while Structured Outputs solve many common issues, two key challenges emerge when building more sophisticated applications - that of Validation and Streaming.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to top but theres more issues than these two, like latency, and also use a list

docs/blog/posts/introducing-structured-outputs.md Outdated Show resolved Hide resolved
#> {"name":"Jason","age":25}
```

## Should you be using Structured Output mode?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you ask should i use it, but then you share latency metrics,

it should just be a section on latency

docs/blog/posts/introducing-structured-outputs.md Outdated Show resolved Hide resolved
docs/blog/posts/introducing-structured-outputs.md Outdated Show resolved Hide resolved

## What's Open AI's Structured Output mode all about?

OpenAI's new Structured Output mode is a huge step change for developers building complex workflows. Given an arbitrary JSON Schema, Structured Output ensures that the response matches the schema exactly.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. mention that we were referenced to the release article
  2. link to the release article


### Streaming

Streaming with Structured Outputs is supported but a challenging endeavour. There's no built-in partial validation and you need to manually parse the generated response while simultaneously having to now use a context manager to access the generated values.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to docs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added streaming links at the bottom portion instead when we mention instructor

docs/blog/posts/introducing-structured-outputs.md Outdated Show resolved Hide resolved
@jxnl
Copy link
Owner

jxnl commented Aug 20, 2024

personally don't care if its 'Announcing support for Structured Output' it benefits no one,

"should i use structured outputs vs instructor" answers a qustion peopel have in their head.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on b3d4dec in 38 seconds

More details
  • Looked at 231 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. instructor/client.py:417
  • Draft comment:
    The mode STRUCTURED_OUTPUTS has been changed to TOOLS_STRICT. Ensure this change is consistent across the codebase and that all references to STRUCTURED_OUTPUTS are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR changes the mode from STRUCTURED_OUTPUTS to TOOLS_STRICT in multiple places. This change should be consistent across the codebase. The change in instructor/client.py is consistent with the changes in other files.
2. instructor/dsl/iterable.py:97
  • Draft comment:
    The mode STRUCTURED_OUTPUTS has been changed to TOOLS_STRICT. Ensure this change is consistent across the codebase and that all references to STRUCTURED_OUTPUTS are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR changes the mode from STRUCTURED_OUTPUTS to TOOLS_STRICT in multiple places. This change should be consistent across the codebase. The change in instructor/dsl/iterable.py is consistent with the changes in other files.
3. instructor/dsl/partial.py:173
  • Draft comment:
    The mode STRUCTURED_OUTPUTS has been changed to TOOLS_STRICT. Ensure this change is consistent across the codebase and that all references to STRUCTURED_OUTPUTS are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR changes the mode from STRUCTURED_OUTPUTS to TOOLS_STRICT in multiple places. This change should be consistent across the codebase. The change in instructor/dsl/partial.py is consistent with the changes in other files.
4. instructor/function_calls.py:268
  • Draft comment:
    The mode STRUCTURED_OUTPUTS has been changed to TOOLS_STRICT. Ensure this change is consistent across the codebase and that all references to STRUCTURED_OUTPUTS are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR changes the mode from STRUCTURED_OUTPUTS to TOOLS_STRICT in multiple places. This change should be consistent across the codebase. The change in instructor/function_calls.py is consistent with the changes in other files.
5. instructor/process_response.py:253
  • Draft comment:
    The mode STRUCTURED_OUTPUTS has been changed to TOOLS_STRICT. Ensure this change is consistent across the codebase and that all references to STRUCTURED_OUTPUTS are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR changes the mode from STRUCTURED_OUTPUTS to TOOLS_STRICT in multiple places. This change should be consistent across the codebase. The change in instructor/process_response.py is consistent with the changes in other files.
6. instructor/retry.py:111
  • Draft comment:
    The mode STRUCTURED_OUTPUTS has been changed to TOOLS_STRICT. Ensure this change is consistent across the codebase and that all references to STRUCTURED_OUTPUTS are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR changes the mode from STRUCTURED_OUTPUTS to TOOLS_STRICT in multiple places. This change should be consistent across the codebase. The change in instructor/retry.py is consistent with the changes in other files.
7. instructor/mode.py:22
  • Draft comment:
    If TOOLS_STRICT is a new mode replacing STRUCTURED_OUTPUTS, ensure that documentation and tests are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The PR introduces a new mode TOOLS_STRICT replacing STRUCTURED_OUTPUTS. This change should be reflected in the documentation and tests.
8. instructor/client.py:68
  • Draft comment:
    The create method overloads have been modified. Ensure that the documentation is updated to reflect these changes.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The create method overloads have been modified, but the changes are not documented. This is a library code change, so documentation should be updated.
9. instructor/client.py:132
  • Draft comment:
    The create_partial method overloads have been modified. Ensure that the documentation is updated to reflect these changes.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The create_partial method overloads have been modified, but the changes are not documented. This is a library code change, so documentation should be updated.
10. instructor/client.py:188
  • Draft comment:
    The create_iterable method overloads have been modified. Ensure that the documentation is updated to reflect these changes.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The create_iterable method overloads have been modified, but the changes are not documented. This is a library code change, so documentation should be updated.
11. instructor/client.py:232
  • Draft comment:
    The create_with_completion method overloads have been modified. Ensure that the documentation is updated to reflect these changes.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The create_with_completion method overloads have been modified, but the changes are not documented. This is a library code change, so documentation should be updated.
12. instructor/client.py:447
  • Draft comment:
    The from_litellm function has been modified. Ensure that the documentation is updated to reflect these changes.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The from_litellm function has been modified, but the changes are not documented. This is a library code change, so documentation should be updated.

Workflow ID: wflow_QbK0th4AZ1Hxsg9H


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ivanleomk ivanleomk force-pushed the introducing-structured-outputs branch from b3d4dec to e1eb60c Compare August 21, 2024 02:01
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on e2dda01 in 23 seconds

More details
  • Looked at 437 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/blog/posts/introducing-structured-outputs.md:176
  • Draft comment:
    Typo in 'targetted'. Consider changing it to 'targeted'.
This built-in retry logic allows for targeted correction to the generated response, ensuring that outputs are not only consistent with your schema but also correct for your use-case. This is invaluable in building reliable LLM systems.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The blog post contains a typo in the word 'targetted'.
2. docs/blog/posts/introducing-structured-outputs.md:284
  • Draft comment:
    Typo in 'swtich'. Consider changing it to 'switch'.
For example, the switch from OpenAI to Anthropic requires only three adjustments
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The blog post contains a typo in the word 'swtich'.

Workflow ID: wflow_a7twIC67RZiZTyjh


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

- OpenAI
authors:
- ivanleomk
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new markdown file, ensure it's added to mkdocs.yml for proper documentation inclusion.

@ivanleomk ivanleomk force-pushed the introducing-structured-outputs branch from e2dda01 to 322ab5e Compare August 21, 2024 14:46
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 322ab5e in 25 seconds

More details
  • Looked at 437 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. docs/blog/posts/introducing-structured-outputs.md:71
  • Draft comment:
    Typo: 'targetted' should be 'targeted'.
This leaves developers without the means to implement retry logic so that the LLM can provide a targeted correction and regenerate its response.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The term 'targetted' is misspelled and should be corrected for clarity and professionalism.
2. docs/blog/posts/introducing-structured-outputs.md:126
  • Draft comment:
    Typo: 'satisfication' should be 'satisfaction'.
potentially impacting the overall user satisfaction and retention rates.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The word 'satisfication' is misspelled and should be corrected to 'satisfaction'.
3. docs/blog/posts/introducing-structured-outputs.md:284
  • Draft comment:
    Typo: 'swtich' should be 'switch'.
For example, the switch from OpenAI to Anthropic requires only three adjustments
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The word 'swtich' is misspelled and should be corrected to 'switch'.

Workflow ID: wflow_BhCLvKKLw9rRJ0hk


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 322ab5e in 17 seconds

More details
  • Looked at 437 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. docs/blog/posts/introducing-structured-outputs.md:71
  • Draft comment:
    Typo: 'targetted' should be 'targeted'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The term 'targetted' is misspelled and should be corrected to 'targeted'.
2. docs/blog/posts/introducing-structured-outputs.md:176
  • Draft comment:
    Typo: 'corect' should be 'correct'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The term 'corect' is misspelled and should be corrected to 'correct'.
3. docs/blog/posts/introducing-structured-outputs.md:284
  • Draft comment:
    Typo: 'swtich' should be 'switch'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The word 'swtich' is misspelled and should be corrected to 'switch'.
4. docs/blog/posts/introducing-structured-outputs.md:9
  • Draft comment:
    Since this is a new blog post, ensure it is added to mkdocs.yml for proper documentation.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The document is a new blog post, so it should be added to mkdocs.yml for proper documentation.

Workflow ID: wflow_BhCLvKKLw9rRJ0hk


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 378f7fa in 1 minute and 5 seconds

More details
  • Looked at 39 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. docs/blog/posts/introducing-structured-outputs.md:21
  • Draft comment:
    Add a period at the end of the sentence for consistency.
1. **Limited Validation And Retry Logic**: Structured Outputs ensure adherence to the schema but not useful content. You might get perfectly formatted yet unhelpful responses.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The blog post contains several instances where sentences are missing periods at the end. This is a grammatical issue that should be corrected for consistency and professionalism.
2. docs/blog/posts/introducing-structured-outputs.md:22
  • Draft comment:
    Add a period at the end of the sentence for consistency.
2. **Streaming Challenges**: Parsing raw JSON objects from streamed responses with the sdk is error-prone and inefficient.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The blog post contains several instances where sentences are missing periods at the end. This is a grammatical issue that should be corrected for consistency and professionalism.
3. docs/blog/posts/introducing-structured-outputs.md:23
  • Draft comment:
    Add a period at the end of the sentence for consistency.
3. **Unpredictable Latency Issues** : Structured Outputs suffers from random latency spikes that might result in an almost 20x increase in response time.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The blog post contains several instances where sentences are missing periods at the end. This is a grammatical issue that should be corrected for consistency and professionalism.
4. docs/blog/posts/introducing-structured-outputs.md:18
  • Draft comment:
    Ensure this new blog post is added to the mkdocs.yml file for proper documentation inclusion.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The blog post is a new addition and should be included in the mkdocs.yml file for documentation.

Workflow ID: wflow_LqRNtfeHVX8MSe5U


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ivanleomk ivanleomk requested a review from jxnl August 21, 2024 15:15
Copy link
Owner

@jxnl jxnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to approve merge any time, but you should also mention 'vendor lock in' as one of the challenges.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 1e49b66 in 25 seconds

More details
  • Looked at 15 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_T51vXKcMq76QInGk


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

But before you do so, three key challenges remain:

1. **Limited Validation And Retry Logic**: Structured Outputs ensure adherence to the schema but not useful content. You might get perfectly formatted yet unhelpful responses
2. **Streaming Challenges**: Parsing raw JSON objects from streamed responses with the sdk is error-prone and inefficient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure this new blog post is added to the mkdocs.yml file for documentation.

@ivanleomk ivanleomk merged commit 30f4e2d into structured-output-v2 Aug 22, 2024
6 of 7 checks passed
@ivanleomk ivanleomk deleted the introducing-structured-outputs branch August 22, 2024 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants