Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LiveBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningLiveBench Reasoning
Accuracy92
80
ReasoningLiveBench
Accuracy22.3
25
Code GenerationLiveBench
Avg@842.9
22
General ReasoningLiveBench
Accuracy53.47
20
CodingLiveBench
Accuracy40.23
15
Single-event Scene Revisit (Different Pose)LiveBench
DINO Feature Similarity (FG)0.691
8
Single-event Scene Revisit (Same Pose)LiveBench
PSNR (Background)20.132
8
General TasksLiveBench 2024-11-25
Accuracy75.9
5
Mathematical ReasoningLiveBench Math (test)
Score51.95
5
ExaminationLiveBench 2024-11-25
Score70.79
5
General TasksLiveBench 0831
Accuracy0.57
5
Showing 11 of 11 rows