Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new techniques #810

Merged
merged 12 commits into from
Jul 9, 2024
Merged

Added new techniques #810

merged 12 commits into from
Jul 9, 2024

Conversation

ivanleomk
Copy link
Collaborator

@ivanleomk ivanleomk commented Jul 5, 2024

Added some new techniques for validating and verifying LLM outputs

  1. Self-Verification
  2. Self-Calibration
  3. Cumulative Reasoning
  4. Reverse Chain of Thought
  5. Decomposition

🚀 This description was created by Ellipsis for commit b4cef66

Summary:

Added new techniques for validating and verifying LLM outputs, including Self-Verification, Self-Calibration, Cumulative Reasoning, Reverse Chain of Thought, and Decomposition, with updated documentation and navigation.

Key points:

  • Added Self-Verification technique in docs/prompting/self_criticism/self_verification.md
  • Added Self-Calibration technique in docs/prompting/self_criticism/self_calibration.md
  • Added Cumulative Reasoning technique in docs/prompting/self_criticism/cumulative_reason.md
  • Added Reverse Chain of Thought technique in docs/prompting/self_criticism/reversecot.md
  • Updated docs/prompting/self_criticism/self_refine.md with implementation details
  • Modified mkdocs.yml to include new techniques in the navigation structure
  • Added references to relevant research papers in each markdown file

Generated with ❤️ by ellipsis.dev

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to dc377ad in 30 seconds

More details
  • Looked at 539 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. mkdocs.yml:253
  • Draft comment:
    The new markdown files self_calibration.md, self_refine.md, and self_verification.md are not added to the navigation in mkdocs.yml. Please update the navigation section to include these new documents to ensure they are accessible in the generated site navigation.
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_ZoYkdu2hx4MdvAda


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

cloudflare-workers-and-pages bot commented Jul 5, 2024

Deploying instructor with  Cloudflare Pages  Cloudflare Pages

Latest commit: 07497d3
Status: ✅  Deploy successful!
Preview URL: https://99ccf1ef.instructor.pages.dev
Branch Preview URL: https://self-verification.instructor.pages.dev

View logs

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 72e8ec6 in 45 seconds

More details
  • Looked at 290 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. mkdocs.yml:255
  • Draft comment:
    The navigation titles for the newly added techniques in mkdocs.yml should match the document names for clarity. Consider updating the titles to match the document names exactly as they appear in the PR description.
      - Self-Verification: 'prompting/self_criticism/self_verification.md'
      - Self-Calibration: 'prompting/self_criticism/self_calibration.md'
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_2GpsgPIpoVkIyTg9


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on c6d7465 in 38 seconds

More details
  • Looked at 252 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/prompting/self_criticism/self_verification.md:2
  • Draft comment:
    The description contains a grammatical error. Please correct "it's" to "its" to use the correct possessive form:
description: "Self Verification involves getting language models to generate a candidate response before evaluating each individual intermediate reasoning step to verify if its logical entailment holds"
  • Reason this comment was not posted:
    Confidence of 80% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_HpBzdpeBnAMkPtQU


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 1ae0daa in 34 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. mkdocs.yml:254
  • Draft comment:
    The PR description indicates that new documents were added (Self-Verification and Self-Calibration), but these are not included in the navigation structure in mkdocs.yml. Please update the navigation to include these new documents to ensure they are accessible to users.
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_TaStpnGb5vwToWrI


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 3ad7c73 in 35 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. mkdocs.yml:254
  • Draft comment:
    The PR description indicates the addition of Self-Verification and Self-Calibration techniques, but the mkdocs.yml does not reflect these additions in the navigation structure. Please update the navigation to include these new documents to ensure they are accessible.
      - Self-Verification: 'prompting/self_criticism/self_verification.md'
      - Self-Calibration: 'prompting/self_criticism/self_calibration.md'
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_FBsYHRxwkkHLnIqh


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on d7cad25 in 59 seconds

More details
  • Looked at 160 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. docs/prompting/decomposition/decomp.md:81
  • Draft comment:
    Consider making the model identifier configurable instead of hardcoding it to gpt-4o. This can be done by passing it as a parameter or setting it through an environment variable to enhance flexibility and maintainability.
        model=configurable_model_identifier,
  • Reason this comment was not posted:
    Confidence of 30% on close inspection, compared to threshold of 85%.
2. docs/prompting/decomposition/decomp.md:69
  • Draft comment:
    Consider adding error handling around the API call in derive_action_plan to manage exceptions gracefully. This could involve using try-except blocks to catch and handle specific exceptions that might occur during the API request.
try:
    return client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "Generate an action plan to help you complete
                the task outlined by the user",
            },
            {"role": "user", "content": task_description},
        ],
        response_model=ActionPlan,
        max_retries=3,
        model="gpt-4o",
    )
except Exception as e:
    print(f"An error occurred: {e}")
    return None
  • Reason this comment was not posted:
    Confidence of 30% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_bsjh57rWMOoDkTtQ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 3205b28 in 52 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/prompting/decomposition/decomp.md:15
  • Draft comment:
    The PR description mentions updates to several documents and mkdocs.yml, but the diff provided only shows a minor change in this document. Please ensure that all intended changes are included in the PR.
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_GI4vp2TZ17DvYm4F


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 200bae4 in 43 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. mkdocs.yml:256
  • Draft comment:
    The PR description indicates that new documents self_verification.md and self_calibration.md were added, but they are not included in the mkdocs.yml navigation structure. Please update the navigation to include these new documents to ensure they are accessible from the website.
    - Self-Verification: 'prompting/self_criticism/self_verification.md'
    - Self-Calibration: 'prompting/self_criticism/self_calibration.md'
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_bC7y9HLDcSqNotei


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ivanleomk ivanleomk requested review from jxnl and shreya-51 July 5, 2024 07:58
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 11c7a23 in 25 seconds

More details
  • Looked at 121 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_4JHkP6NcqyQWDBpy


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Review A: {review_a}
Review B: {review_b}.

Pick your answer from ['Review A', 'Review B', 'both',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this part go into the docstring of PairwiseEvaluation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it would be better in the prompt personally so that it's a bit clearer. Seems a bit abrupt to have a stand alone pick your answer in the docstring? Or are you referring to something like

class PairwiseEvaluation(BaseModel):
    """
  This is a class that represents a comparison of two reviews. Make sure to identify
   which one is more aligned with the target sentiment. Pick your answer from 
  the given alignment_results - Rview A, Review B or both
   """"
    feedback: str
    alignment_result: Literal[
        "Review A",
        "Review B",
        "Both",
    ]

Copy link
Owner

@jxnl jxnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some formatting and prompt positioning looks really good!

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 1074e0a in 1 minute and 3 seconds

More details
  • Looked at 328 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_84DoPjq34RJ5okcT


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on b6a148a in 1 minute and 31 seconds

More details
  • Looked at 328 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_12vuXnUH0wq9cf1I


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ivanleomk ivanleomk requested a review from jxnl July 8, 2024 15:37
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on b4cef66 in 1 minute and 4 seconds

More details
  • Looked at 331 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/prompting/self_criticism/self_refine.md:1
  • Draft comment:
    The content of this file has been replaced with placeholders and marked as [wip], which contradicts the PR description stating that new techniques and documentation updates were added. Please ensure the content is complete before merging.

This comment applies to self_verification.md as well.

  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_aOMZ5GykQE7CekxT


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ivanleomk ivanleomk merged commit 35ef440 into main Jul 9, 2024
8 of 16 checks passed
@ivanleomk ivanleomk deleted the self-verification branch July 9, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants