Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Competition-level Mathematics and Science Reasoning on OlympiadBench (Accuracy)

22.53Accuracy

Repeated Sampling

12.358814.999417.6420.2806Oct 4, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
22.53
2025.10
21.87
2025.10
18.35
2025.10
17.47
2025.10
15.27
2025.10
12.75