Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OlyBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningOlyBench
Accuracy44.7
59
Symbolic ReasoningOlyBench
Accuracy36.5
25
Mathematical ReasoningOlyBench
Pass@1 Accuracy47.1
22
Physics ReasoningOlyBench Phy
Acceptance Length4.5
12
Mathematical ReasoningOlyBench Math
Acceptance Length4.7
12
Showing 5 of 5 rows