Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Language Modeling and Reasoning on PIQA, ARC, HellaSwag, WinoG, BoolQ, LAMBADA, and C4
Loading...
76.33
PIQA Accuracy
L2QER
75.6436
75.8218
76
76.1782
Mar 26, 2026
PIQA Accuracy
ARC-C Accuracy
ARC-E Accuracy
HellaSwag Accuracy (Normalized)
WinoG Accuracy
BoolQ Accuracy
LAMBADA Accuracy
C4 Word Perplexity
Average Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
PIQA Accuracy
ARC-C Accuracy
ARC-E Accuracy
HellaSwag Accuracy (Normalized)
WinoG Accuracy
BoolQ Accuracy
LAMBADA Accuracy
C4 Word Perplexity
Average Accuracy
L2QER
Rank=64, Evaluation=Ze...
2026.03
76.33
42.33
70
66
67
80.67
68
8.93
67.19
FP16
Rank=-, Evaluation=Zer...
2026.03
76
41.33
70.33
66
68
80.33
72.33
8.7
67.76
QERA
Rank=64, Evaluation=Ze...
2026.03
76
42.67
70.67
67
67.33
81
69.67
8.91
67.76
GlowQ-S
Rank=64, Evaluation=Ze...
2026.03
76
43.67
69.33
66
66.67
82
70
8.99
67.67
ZeroQuant-V2
Rank=64, Evaluation=Ze...
2026.03
75.67
43
71
65
65.33
80
68.33
9.07
66.9
GlowQ
Rank=64, Evaluation=Ze...
2026.03
75.67
41.67
70
66.67
67
80.33
69.67
8.87
67.29
Feedback
Search any
task
Search any
task