You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed an issue where similarity scores do not get updated when I change my expected fields. Only when I re-run the experiment are the values updated.
Bug
Steps to reproduce:
models= ["gpt-3.5-turbo", "gpt-3.5-turbo-0613"]
messages= [
[
{"role": "system", "content": "Who is the first president of the US? Give me only the name"},
]
]
temperatures= [0.0]
experiment=OpenAIChatExperiment(models, messages, temperature=temperatures)
experiment.run()
experiment.visualize()
fromprompttools.utilsimportsemantic_similarityexperiment.evaluate("similar_to_expected", semantic_similarity, expected=["George Washington"] *2)
experiment.visualize()
fromprompttools.utilsimportsemantic_similarityexperiment.evaluate("similar_to_expected", semantic_similarity, expected=["Lady Gaga"] *2)
experiment.visualize() # the evaluation results here indicate that "Lady Gaga" is semantically identical to "George Washington"
In my opinion, evaluate() should re-compute metrics every time it is run, rather than depending/being coupled to another function (run()). I haven't tested it on other eval_fns, but it could be worth testing if this is the case as well.
The text was updated successfully, but these errors were encountered:
Your observation is correct. Currently, if a metric already exists (which is "similar_to_expected" in your case), it raises a warning (as seen in your notebook "WARNING: similar_to_expected is already present, skipping") rather than overwriting it.
If you change the metric name given in the second .evaluate call (i.e. experiment.evaluate("similar_to_expected_2", ...)), it will compute another column.
We are open to considering overwriting it even when the existing metric already exists. Let us know what you think.
🐛 Describe the bug
Hi folks,
Thanks again for your work on this library.
I noticed an issue where similarity scores do not get updated when I change my
expected
fields. Only when I re-run the experiment are the values updated.Bug
Steps to reproduce:
In my opinion,
evaluate()
should re-compute metrics every time it is run, rather than depending/being coupled to another function (run()
). I haven't tested it on other eval_fns, but it could be worth testing if this is the case as well.The text was updated successfully, but these errors were encountered: