Replace Claude Safety Judge with Llama3.1 #2974

farzaank · 2024-09-04T21:12:10Z

Replaces Claude with Llama 3.1.

Motivation is that Claude would refuse judging for some model-scenario pairs. We decided to try 2 open models as a judge as an alternative: Llama Guard 3 and Llama 3.1 if that didn't work out.

Llama Guard 3, has the behavior of occasionally rating refusals as unsafe when they mention unsafe subject matter (this false positive rate is well documented: "while deploying Llama Guard 3 will likely improve the safety of your system, it might increase refusals to benign prompts (False Positives)") so to keep behavioral consistency with our other judge (GPT-4o), we use Llama 3.1 instead. For now, I set the size to the largest one.

farzaank · 2024-09-04T21:12:41Z

src/helm/benchmark/annotation/anthropic_red_team_annotator.py

@@ -2,7 +2,7 @@

 from helm.benchmark.adaptation.request_state import RequestState
 from helm.benchmark.annotation.annotator import Annotator
-from helm.benchmark.annotation.score_util import score_with_reasoning
+from helm.benchmark.annotation.model_as_judge import score_with_reasoning


missed this in the last PR for some files, so here it is

yifanmai

looks good, thanks!

farzaank added 2 commits September 4, 2024 01:21

update

3e5fc6e

replace claude with llama

893545a

farzaank requested a review from yifanmai September 4, 2024 21:12

farzaank commented Sep 4, 2024

View reviewed changes

removed llama guard stuff

2137166

yifanmai approved these changes Sep 5, 2024

View reviewed changes

yifanmai merged commit 13b718a into main Sep 5, 2024
6 checks passed

yifanmai deleted the farzaan/llama-safety branch September 5, 2024 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Claude Safety Judge with Llama3.1 #2974

Replace Claude Safety Judge with Llama3.1 #2974

farzaank commented Sep 4, 2024 •

edited

Loading

farzaank Sep 4, 2024

yifanmai left a comment

Replace Claude Safety Judge with Llama3.1 #2974

Replace Claude Safety Judge with Llama3.1 #2974

Conversation

farzaank commented Sep 4, 2024 • edited Loading

farzaank Sep 4, 2024

Choose a reason for hiding this comment

yifanmai left a comment

Choose a reason for hiding this comment

farzaank commented Sep 4, 2024 •

edited

Loading