Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AGIEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAGIEval MATH
Accuracy95.7
99
Comprehensive ExaminationAGIEval (test)
Accuracy62.3
37
General ReasoningAGIEval
Exact Match70.4
33
Mathematical ReasoningAGIEval-MATH (test)
Accuracy93.3
31
Natural Language UnderstandingAGIEval
Accuracy71.6
30
General EvaluationAGIEval
Accuracy70.22
29
ReasoningAGIEval English
Score (%)74.4
21
Human-level Standardized Exam EvaluationAGIEval
Score51.05
18
Out-of-Domain GeneralizationAGIEval Out-of-Domain Law (test)
Average OOD Accuracy43.41
16
General ReasoningAGIEval en
Speedup Ratio2.132
15
General Reasoningagieval
Accuracy63.71
14
Question AnsweringAGIEval
Vanilla Accuracy43.92
14
Mathematical ProficiencyAGIEval MATH Level-5
Accuracy64.45
13
Question AnsweringAGIEval
Accuracy32.11
12
ReasoningAGIEval
AGIEval Reasoning Accuracy48.88
10
General Intelligence EvaluationAGIEval (test)
AGIEval (3-shot)27
8
Question AnsweringAgieval Cn
Accuracy36.58
7
Standardized exam solvingAGIEval
Accuracy30.91
6
Question AnsweringAGIEval (test)
AQUA-RAT28.3
5
General Intelligence EvaluationAGIEval G
Accuracy72
4
Standardized Exam ReasoningAGIEval 5-shot
LSAT-RC (5-shot)27.2
3
ReasoningAGIEval Cn
Normalized Accuracy36.44
3
General KnowledgeAGIEval En
CoT EM77.92
3
General Language UnderstandingAGIEval 5-shot
Accuracy80.22
3
Question AnsweringAGIEval en (test)
Accuracy18.3
2
Showing 25 of 25 rows