Skip to content

Actions: openai/evals

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
981 workflow runs
981 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add support for new models (gpt-4o, o1-preview and o1-mini)
Run unit tests #1781: Pull request #1558 opened by sakher
September 15, 2024 09:46 Action required sakher:add-o1-models-support
September 15, 2024 09:46 Action required
Bugfixing completion stats break with new reasoning tokens release
Run unit tests #1780: Pull request #1555 opened by lucapericlp
September 13, 2024 10:04 Action required lucapericlp:main
September 13, 2024 10:04 Action required
anthropic_solver.py
Run unit tests #1779: Pull request #1554 opened by iHuydang
September 4, 2024 09:01 Action required iHuydang:patch-1
September 4, 2024 09:01 Action required
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
Run unit tests #1776: Pull request #1551 opened by RobinWitch
August 25, 2024 16:39 Action required RobinWitch:fix_mmlu
August 25, 2024 16:39 Action required
Fix the is_chat_model function to work with gpt-4o
Run unit tests #1775: Pull request #1550 opened by LoryPack
August 22, 2024 15:16 3m 36s LoryPack:add-4o
August 22, 2024 15:16 3m 36s
Updating make-me-say to be compatible with Solvers
Run unit tests #1774: Pull request #1546 synchronize by lennart-finke
August 22, 2024 14:51 Action required lennart-finke:main
August 22, 2024 14:51 Action required
Updating make-me-say to be compatible with Solvers
Run new evals #2283: Pull request #1546 synchronize by lennart-finke
August 22, 2024 14:51 Action required lennart-finke:main
August 22, 2024 14:51 Action required
Remove global OpenAI client initialization
Run unit tests #1764: Pull request #1539 opened by michaelAlvarino
July 21, 2024 17:04 3m 40s michaelAlvarino:main
July 21, 2024 17:04 3m 40s
Remove global OpenAI client initialization
Run new evals #2276: Pull request #1539 opened by michaelAlvarino
July 21, 2024 17:04 2m 13s michaelAlvarino:main
July 21, 2024 17:04 2m 13s
[eval] Add IMO problems with exact answers (#1528)
Run unit tests #1763: Commit 234bcde pushed by kliu128
July 13, 2024 19:52 3m 49s main
July 13, 2024 19:52 3m 49s
Added Quran Eval & Simple Fact Model-Graded Definition
Run new evals #2271: Pull request #1511 synchronize by sakher
June 20, 2024 14:13 2m 22s sakher:quran-eval
June 20, 2024 14:13 2m 22s
Added Quran Eval & Simple Fact Model-Graded Definition
Run unit tests #1754: Pull request #1511 synchronize by sakher
June 20, 2024 14:13 3m 43s sakher:quran-eval
June 20, 2024 14:13 3m 43s
Fix problematic sample in Schelling Point
Run unit tests #1752: Pull request #1534 opened by JunShern
May 22, 2024 23:04 8m 5s jun/schellingpoint-fix
May 22, 2024 23:04 8m 5s
Fix problematic sample in Schelling Point
Run new evals #2270: Pull request #1534 opened by JunShern
May 22, 2024 23:04 4m 38s jun/schellingpoint-fix
May 22, 2024 23:04 4m 38s
eval pattern-concat-logic
Run unit tests #1735: Pull request #1508 synchronize by natanaelwf
May 9, 2024 13:18 3m 55s natanaelwf:pattern-concat-logic
May 9, 2024 13:18 3m 55s
eval pattern-concat-logic
Run new evals #2258: Pull request #1508 synchronize by natanaelwf
May 9, 2024 13:18 2m 25s natanaelwf:pattern-concat-logic
May 9, 2024 13:18 2m 25s
Release 3.0.1 (#1525)
Run unit tests #1733: Commit d3dc890 pushed by etr2460
May 1, 2024 00:50 4m 10s main
May 1, 2024 00:50 4m 10s
Release 3.0.1
Run unit tests #1732: Pull request #1525 opened by etr2460
May 1, 2024 00:24 3m 59s release/3.0.1
May 1, 2024 00:24 3m 59s
Make the torch dep optional (#1524)
Run unit tests #1731: Commit 1d3f11c pushed by etr2460
May 1, 2024 00:14 10m 41s main
May 1, 2024 00:14 10m 41s
Make the torch dep optional
Run unit tests #1730: Pull request #1524 synchronize by etr2460
April 30, 2024 23:56 4m 1s erik/torch-optional
April 30, 2024 23:56 4m 1s
Make the torch dep optional
Run unit tests #1729: Pull request #1524 opened by etr2460
April 30, 2024 23:52 2m 38s erik/torch-optional
April 30, 2024 23:52 2m 38s
Release 3.0.0 (#1520)
Run unit tests #1723: Commit 778caa6 pushed by etr2460
April 17, 2024 22:27 4m 5s main
April 17, 2024 22:27 4m 5s
Release 3.0.0
Run unit tests #1722: Pull request #1520 opened by etr2460
April 17, 2024 22:24 3m 54s release/3.0.0
April 17, 2024 22:24 3m 54s