Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Downstream Task Evaluation on ARC Challenge, BoolQ, OpenbookQA, GSM8K (Strict), and MMLU
Loading...
66.72
ARC Challenge Accuracy
Original
49.6848
54.1074
58.53
62.9526
May 19, 2025
ARC Challenge Accuracy
BoolQ Accuracy
OpenBookQA Accuracy
GSM8K (Strict) Accuracy
MMLU Accuracy
Average Score
Updated 20d ago
Evaluation Results
Method
Method
Links
ARC Challenge Accuracy
BoolQ Accuracy
OpenBookQA Accuracy
GSM8K (Strict) Accuracy
MMLU Accuracy
Average Score
Original
Compression=None, Back...
2025.05
66.72
88.5
41.2
82.79
77.97
71.44
A^3
Compression=10%, Backb...
2025.05
61.18
88.41
38
75.89
73.4
67.38
SVD-LLM
Compression=10%, Backb...
2025.05
57.51
87.03
37.2
61.79
71.34
62.97
A^3
Compression=20%, Backb...
2025.05
52.73
86.45
34.8
60.73
67.15
60.37
SVD-LLM
Compression=20%, Backb...
2025.05
50.34
86.18
32.6
49.13
67.73
57.2
Feedback
Search any
task
Search any
task