Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LiveBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningLiveBench Reasoning
Accuracy92
80
Code GenerationLiveBench
Avg@842.9
22
CodingLiveBench
Accuracy40.23
15
Mathematical ReasoningLiveBench Math (test)
Score51.95
5
ExaminationLiveBench 2024-11-25
Score70.79
5
General TasksLiveBench 0831
Accuracy0.57
5
Showing 6 of 6 rows