Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AGIEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Comprehensive ExaminationAGIEval (test)
Accuracy62.3
34
General ReasoningAGIEval
Exact Match70.4
33
General EvaluationAGIEval
Accuracy70.22
29
Mathematical ReasoningAGIEval MATH
Accuracy95.7
28
Natural Language UnderstandingAGIEval
Accuracy71.6
24
Out-of-Domain GeneralizationAGIEval Out-of-Domain Law (test)
Average OOD Accuracy43.41
16
General ReasoningAGIEval en
Speedup Ratio2.132
15
Human-level Standardized Exam EvaluationAGIEval
Score45.87
14
General Reasoningagieval
Accuracy63.71
14
Question AnsweringAGIEval
Vanilla Accuracy43.92
14
Question AnsweringAGIEval
Accuracy32.11
12
Mathematical ReasoningAGIEval-MATH (test)
Accuracy52.1
11
ReasoningAGIEval
AGIEval Reasoning Accuracy48.88
10
General Intelligence EvaluationAGIEval (test)
AGIEval (3-shot)27
8
Question AnsweringAGIEval (test)
AQUA-RAT28.3
5
General Intelligence EvaluationAGIEval G
Accuracy72
4
General KnowledgeAGIEval En
CoT EM77.92
3
General Language UnderstandingAGIEval 5-shot
Accuracy80.22
3
Showing 18 of 18 rows