Skip to content

Commit 88a2a79

Browse files
authored
Update leaderboard.md
1 parent 1f652c3 commit 88a2a79

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

docs/leaderboard.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,25 @@
44

55
# SciCode Leaderboard
66

7-
| Models | Main Problem Resolve Rate | <span style="background-color:lightgrey">Subproblem</span> |
7+
| Models | Main Problem Resolve Rate | <span style="color:grey">Subproblem</span> |
88
|--------------------------|-------------------------------------|-------------------------------------|
9-
| 🥇 OpenAI o1-preview | <div align="center">7.7</div> | <div align="center" style="background-color:lightgrey">28.5</div> |
10-
| 🥈 Claude3.5-Sonnet | <div align="center">4.6</div> | <div align="center" style="background-color:lightgrey">26.0</div> |
11-
| 🥉 Claude3.5-Sonnet (new) | <div align="center">4.6</div> | <div align="center" style="background-color:lightgrey">25.3</div> |
12-
| Deepseek-Coder-v2 | <div align="center">3.1</div> | <div align="center" style="background-color:lightgrey">21.2</div> |
13-
| GPT-4o | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">25.0</div> |
14-
| GPT-4-Turbo | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">22.9</div> |
15-
| OpenAI o1-mini | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">22.2</div> |
16-
| Gemini 1.5 Pro | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">21.9</div> |
17-
| Claude3-Opus | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">21.5</div> |
18-
| Llama-3.1-405B-Chat | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">19.8</div> |
19-
| Claude3-Sonnet | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
20-
| Qwen2-72B-Instruct | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
21-
| Llama-3.1-70B-Chat | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
22-
| Mixtral-8x22B-Instruct | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">16.3</div> |
23-
| Llama-3-70B-Chat | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">14.6</div> |
9+
| 🥇 OpenAI o1-preview | <div align="center">**7.7**</div> | <div align="center" style="color:grey">28.5</div> |
10+
| 🥈 Claude3.5-Sonnet | <div align="center">**4.6**</div> | <div align="center" style="color:grey">26.0</div> |
11+
| 🥉 Claude3.5-Sonnet (new) | <div align="center">**4.6**</div> | <div align="center" style="color:grey">25.3</div> |
12+
| Deepseek-Coder-v2 | <div align="center">**3.1**</div> | <div align="center" style="color:grey">21.2</div> |
13+
| GPT-4o | <div align="center">**1.5**</div> | <div align="center" style="color:grey">25.0</div> |
14+
| GPT-4-Turbo | <div align="center">**1.5**</div> | <div align="center" style="color:grey">22.9</div> |
15+
| OpenAI o1-mini | <div align="center">**1.5**</div> | <div align="center" style="color:grey">22.2</div> |
16+
| Gemini 1.5 Pro | <div align="center">**1.5**</div> | <div align="center" style="color:grey">21.9</div> |
17+
| Claude3-Opus | <div align="center">**1.5**</div> | <div align="center" style="color:grey">21.5</div> |
18+
| Llama-3.1-405B-Chat | <div align="center">**1.5**</div> | <div align="center" style="color:grey">19.8</div> |
19+
| Claude3-Sonnet | <div align="center">**1.5**</div> | <div align="center" style="color:grey">17.0</div> |
20+
| Qwen2-72B-Instruct | <div align="center">**1.5**</div> | <div align="center" style="color:grey">17.0</div> |
21+
| Llama-3.1-70B-Chat | <div align="center">**0.0**</div> | <div align="center" style="color:grey">17.0</div> |
22+
| Mixtral-8x22B-Instruct | <div align="center">**0.0**</div> | <div align="center" style="color:grey">16.3</div> |
23+
| Llama-3-70B-Chat | <div align="center">**0.0**</div> | <div align="center" style="color:grey">14.6</div> |
2424

25-
Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.
25+
**Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.**
2626

2727
<!-- Once you've added the results to the submission repository,
2828
bring back the table here -->

0 commit comments

Comments
 (0)