Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Language Evaluation on English lm-evaluation-harness
Loading...
0.259
AGIEval Acc (Norm)
Transformer + Spelling Bee Embeddings
0.25848
0.258615
0.25875
0.258885
Jan 25, 2026
AGIEval Acc (Norm)
ARC Easy Acc (Norm)
ARC Challenge Acc (Norm)
BBH Acc (Norm)
DROP F1
HellaSwag Acc (Norm)
MMLU Acc (Norm)
NQ Open Acc
OpenBookQA Acc (Norm)
PIQA Acc (Norm)
SciQ Acc (Norm)
TriviaQA Score
WinoGrande Acc (Norm)
Updated 4d ago
Evaluation Results
Method
Method
Links
AGIEval Acc (Norm)
ARC Easy Acc (Norm)
ARC Challenge Acc (Norm)
BBH Acc (Norm)
DROP F1
HellaSwag Acc (Norm)
MMLU Acc (Norm)
NQ Open Acc
OpenBookQA Acc (Norm)
PIQA Acc (Norm)
SciQ Acc (Norm)
TriviaQA Score
WinoGrande Acc (Norm)
Transformer + Spelling Bee Embeddings
Total parameters=918m,...
2026.01
0.259
0.5387
0.2764
0.0237
0.049
0.4247
0.2302
0.0169
0.31
0.6839
0.716
0.016
0.5375
Transformer
Total parameters=918m,...
2026.01
0.2585
0.5173
0.267
0.018
0.0382
0.4204
0.2331
0.0166
0.296
0.6773
0.676
0.0238
0.5295
Feedback
Search any
task
Search any
task