Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OlymBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningOlymBench
Accuracy75
39
Open-ended Question AnsweringOlymBench Phys v1 (test)
Problem Level Score53.9
9
Showing 2 of 2 rows