Added new techniques #810

ivanleomk · 2024-07-05T03:36:54Z

Added some new techniques for validating and verifying LLM outputs

Self-Verification
Self-Calibration
Cumulative Reasoning
Reverse Chain of Thought
Decomposition

🚀	This description was created by Ellipsis for commit `b4cef66`

Summary:

Added new techniques for validating and verifying LLM outputs, including Self-Verification, Self-Calibration, Cumulative Reasoning, Reverse Chain of Thought, and Decomposition, with updated documentation and navigation.

Key points:

Added Self-Verification technique in docs/prompting/self_criticism/self_verification.md
Added Self-Calibration technique in docs/prompting/self_criticism/self_calibration.md
Added Cumulative Reasoning technique in docs/prompting/self_criticism/cumulative_reason.md
Added Reverse Chain of Thought technique in docs/prompting/self_criticism/reversecot.md
Updated docs/prompting/self_criticism/self_refine.md with implementation details
Modified mkdocs.yml to include new techniques in the navigation structure
Added references to relevant research papers in each markdown file

Generated with ❤️ by ellipsis.dev

ellipsis-dev

👍 Looks good to me! Reviewed everything up to dc377ad in 30 seconds

More details

Looked at 539 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. mkdocs.yml:253

Draft comment:
The new markdown files self_calibration.md, self_refine.md, and self_verification.md are not added to the navigation in mkdocs.yml. Please update the navigation section to include these new documents to ensure they are accessible in the generated site navigation.
Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_ZoYkdu2hx4MdvAda

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

cloudflare-workers-and-pages · 2024-07-05T03:38:01Z

Deploying instructor with Cloudflare Pages

Latest commit:	`07497d3`
Status:	✅ Deploy successful!
Preview URL:	https://99ccf1ef.instructor.pages.dev
Branch Preview URL:	https://self-verification.instructor.pages.dev

View logs

ellipsis-dev

👍 Looks good to me! Incremental review on 72e8ec6 in 45 seconds

More details

Looked at 290 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. mkdocs.yml:255

Draft comment:
The navigation titles for the newly added techniques in mkdocs.yml should match the document names for clarity. Consider updating the titles to match the document names exactly as they appear in the PR description.

      - Self-Verification: 'prompting/self_criticism/self_verification.md'
      - Self-Calibration: 'prompting/self_criticism/self_calibration.md'

Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_2GpsgPIpoVkIyTg9

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on c6d7465 in 38 seconds

More details

Looked at 252 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. docs/prompting/self_criticism/self_verification.md:2

Draft comment:
The description contains a grammatical error. Please correct "it's" to "its" to use the correct possessive form:

description: "Self Verification involves getting language models to generate a candidate response before evaluating each individual intermediate reasoning step to verify if its logical entailment holds"

Reason this comment was not posted:
Confidence of 80% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_HpBzdpeBnAMkPtQU

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 1ae0daa in 34 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. mkdocs.yml:254

Draft comment:
The PR description indicates that new documents were added (Self-Verification and Self-Calibration), but these are not included in the navigation structure in mkdocs.yml. Please update the navigation to include these new documents to ensure they are accessible to users.
Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_TaStpnGb5vwToWrI

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 3ad7c73 in 35 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. mkdocs.yml:254

Draft comment:
The PR description indicates the addition of Self-Verification and Self-Calibration techniques, but the mkdocs.yml does not reflect these additions in the navigation structure. Please update the navigation to include these new documents to ensure they are accessible.

      - Self-Verification: 'prompting/self_criticism/self_verification.md'
      - Self-Calibration: 'prompting/self_criticism/self_calibration.md'

Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_FBsYHRxwkkHLnIqh

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on d7cad25 in 59 seconds

More details

Looked at 160 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. docs/prompting/decomposition/decomp.md:81

Draft comment:
Consider making the model identifier configurable instead of hardcoding it to gpt-4o. This can be done by passing it as a parameter or setting it through an environment variable to enhance flexibility and maintainability.

        model=configurable_model_identifier,

Reason this comment was not posted:
Confidence of 30% on close inspection, compared to threshold of 85%.

2. docs/prompting/decomposition/decomp.md:69

Draft comment:
Consider adding error handling around the API call in derive_action_plan to manage exceptions gracefully. This could involve using try-except blocks to catch and handle specific exceptions that might occur during the API request.

try:
    return client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "Generate an action plan to help you complete
                the task outlined by the user",
            },
            {"role": "user", "content": task_description},
        ],
        response_model=ActionPlan,
        max_retries=3,
        model="gpt-4o",
    )
except Exception as e:
    print(f"An error occurred: {e}")
    return None

Reason this comment was not posted:
Confidence of 30% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_bsjh57rWMOoDkTtQ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 3205b28 in 52 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. docs/prompting/decomposition/decomp.md:15

Draft comment:
The PR description mentions updates to several documents and mkdocs.yml, but the diff provided only shows a minor change in this document. Please ensure that all intended changes are included in the PR.
Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_GI4vp2TZ17DvYm4F

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 200bae4 in 43 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. mkdocs.yml:256

Draft comment:
The PR description indicates that new documents self_verification.md and self_calibration.md were added, but they are not included in the mkdocs.yml navigation structure. Please update the navigation to include these new documents to ensure they are accessible from the website.

    - Self-Verification: 'prompting/self_criticism/self_verification.md'
    - Self-Calibration: 'prompting/self_criticism/self_calibration.md'

Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_bC7y9HLDcSqNotei

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 11c7a23 in 25 seconds

More details

Looked at 121 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_4JHkP6NcqyQWDBpy

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

docs/prompting/decomposition/decomp.md

docs/prompting/self_criticism/cumulative_reason.md

docs/prompting/self_criticism/reversecot.md

jxnl · 2024-07-06T17:21:39Z

docs/prompting/self_criticism/self_refine.md

+                Review A: {review_a}
+                Review B: {review_b}.
+
+                Pick your answer from ['Review A', 'Review B', 'both',


should this part go into the docstring of PairwiseEvaluation

Hmm I think it would be better in the prompt personally so that it's a bit clearer. Seems a bit abrupt to have a stand alone pick your answer in the docstring? Or are you referring to something like

class PairwiseEvaluation(BaseModel): """ This is a class that represents a comparison of two reviews. Make sure to identify which one is more aligned with the target sentiment. Pick your answer from the given alignment_results - Rview A, Review B or both """" feedback: str alignment_result: Literal[ "Review A", "Review B", "Both", ]

jxnl

some formatting and prompt positioning looks really good!

ellipsis-dev

❌ Changes requested. Incremental review on 1074e0a in 1 minute and 3 seconds

More details

Looked at 328 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_84DoPjq34RJ5okcT

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

docs/prompting/self_criticism/cumulative_reason.md

ellipsis-dev

👍 Looks good to me! Incremental review on b6a148a in 1 minute and 31 seconds

More details

Looked at 328 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_12vuXnUH0wq9cf1I

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on b4cef66 in 1 minute and 4 seconds

More details

Looked at 331 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. docs/prompting/self_criticism/self_refine.md:1

Draft comment:
The content of this file has been replaced with placeholders and marked as [wip], which contradicts the PR description stating that new techniques and documentation updates were added. Please ensure the content is complete before merging.

This comment applies to self_verification.md as well.

Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 85%.

Workflow ID: wflow_aOMZ5GykQE7CekxT

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Added more prompting techniques

ellipsis-dev bot reviewed Jul 5, 2024

View reviewed changes

ivanleomk requested review from jxnl and shreya-51 July 5, 2024 07:58

ellipsis-dev bot reviewed Jul 5, 2024

View reviewed changes

jxnl reviewed Jul 6, 2024

View reviewed changes

docs/prompting/decomposition/decomp.md Show resolved Hide resolved

jxnl reviewed Jul 6, 2024

View reviewed changes

docs/prompting/self_criticism/cumulative_reason.md Outdated Show resolved Hide resolved

jxnl reviewed Jul 6, 2024

View reviewed changes

docs/prompting/self_criticism/reversecot.md Outdated Show resolved Hide resolved

jxnl reviewed Jul 6, 2024

View reviewed changes

docs/prompting/self_criticism/reversecot.md Outdated Show resolved Hide resolved

jxnl reviewed Jul 6, 2024

View reviewed changes

jxnl requested changes Jul 6, 2024

View reviewed changes

ivanleomk added 10 commits July 8, 2024 22:38

Fixed up two new techniques

4d421a4

Added reverse COT

3ccdea5

Added cumulative reasoning

818ffd0

Fixed up mkdocs.yml

ca78688

Renamed link

d86a5e5

Fixed up decomp example

9e28497

Removed linenums

5e418a7

renamed the decomp

55712aa

Fixed small typos

7c1db32

Updated the docs with requested changes

1074e0a

ivanleomk force-pushed the self-verification branch from b6a148a to 1074e0a Compare July 8, 2024 14:38

ellipsis-dev bot reviewed Jul 8, 2024

View reviewed changes

docs/prompting/self_criticism/cumulative_reason.md Show resolved Hide resolved

ellipsis-dev bot reviewed Jul 8, 2024

View reviewed changes

ivanleomk requested a review from jxnl July 8, 2024 15:37

Reverted the two

b4cef66

ellipsis-dev bot reviewed Jul 9, 2024

View reviewed changes

jxnl approved these changes Jul 9, 2024

View reviewed changes

Fixed up more techniques on decomp and chain of thought (#819)

07497d3

Added more prompting techniques

ivanleomk merged commit 35ef440 into main Jul 9, 2024
8 of 16 checks passed

ivanleomk deleted the self-verification branch July 9, 2024 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new techniques #810

Added new techniques #810

ivanleomk commented Jul 5, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

cloudflare-workers-and-pages bot commented Jul 5, 2024 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

jxnl Jul 6, 2024

ivanleomk Jul 8, 2024

jxnl left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Added new techniques #810

Added new techniques #810

Conversation

ivanleomk commented Jul 5, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

cloudflare-workers-and-pages bot commented Jul 5, 2024 • edited Loading

Deploying instructor with Cloudflare Pages

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

jxnl Jul 6, 2024

Choose a reason for hiding this comment

ivanleomk Jul 8, 2024

Choose a reason for hiding this comment

jxnl left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ivanleomk commented Jul 5, 2024 •

edited by ellipsis-dev bot

Loading

cloudflare-workers-and-pages bot commented Jul 5, 2024 •

edited

Loading