Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOD Set

Benchmarks

Task NameDataset NameSOTA ResultTrend
Model RoutingOOD Set AIME Humanity's Last Exam SimpleQA OlympiadBench (test)
Avg. A50.8
11
Showing 1 of 1 rows