Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FrontierScience-Olympiad

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningFrontierScience-Olympiad
Accuracy50.8
63
Scientific Olympiad ReasoningFrontierScience-Olympiad
Biology Accuracy43.5
30
Scientific Problem SolvingFrontierScience-Olympiad
Token Efficiency Ratio (B_method/BMV)5.59
27
Scientific olympiad problem solvingFrontierScience Olympiad
Accuracy80
12
LLM Self-Consistency CertificationFrontierScience Olympiad
Bonferroni Score53
10
Scientific problem solvingFrontierScience Olympiad N=20 (test)
Metric-
0
Showing 6 of 6 rows