Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot Evaluation on ARC (c/e), WinoGrande, BoolQ, HellaSwag, OBQA, PIQA, and MMLU (test val)

0.7361Average Accuracy

Dense

0.3828120.4745310.566250.657969Dec 28, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0.7361
2025.12
0.7224
2025.12
0.7211
2025.12
0.716
2025.12
0.7134
2025.12
0.7124
2025.12
0.7113
2025.12
0.7054
2025.12
0.703
2025.12
0.6968
2025.12
0.696
2025.12
0.693
2025.12
0.6924
2025.12
0.6877
2025.12
0.6858
2025.12
0.6814
2025.12
0.6796
2025.12
0.6794
2025.12
0.6791
2025.12
0.6773
2025.12
0.6723
2025.12
0.6706
2025.12
0.6682
2025.12
0.6669
2025.12
0.6659
2025.12
0.6605
2025.12
0.655
2025.12
0.6465
2025.12
0.6421
2025.12
0.6329
2025.12
0.6327
2025.12
0.6321
2025.12
0.6288
2025.12
0.6265
2025.12
0.6227
2025.12
0.6219
2025.12
0.6161
2025.12
0.6155
2025.12
0.6153
2025.12
0.6119
2025.12
0.6116
2025.12
0.6086
2025.12
0.6034
2025.12
0.5983
2025.12
0.5961
2025.12
0.595
2025.12
0.5944
2025.12
0.5915
2025.12
0.5898
2025.12
0.5862
2025.12
0.5805
2025.12
0.5788
2025.12
0.5756
2025.12
0.573
2025.12
0.5727
2025.12
0.5678
2025.12
0.5662
2025.12
0.5625
2025.12
0.5616
2025.12
0.561
2025.12
0.5606
2025.12
0.5516
2025.12
0.5496
2025.12
0.5425
2025.12
0.5382
2025.12
0.5318
2025.12
0.5316
2025.12
0.5249
2025.12
0.5247
2025.12
0.5245
2025.12
0.52
2025.12
0.5156
2025.12
0.5139
2025.12
0.5094
2025.12
0.5091
2025.12
0.501
2025.12
0.488
2025.12
0.487
2025.12
0.487
2025.12
0.469
2025.12
0.467
2025.12
0.454
2025.12
0.4477
2025.12
0.4431
2025.12
0.4295
2025.12
0.4147
2025.12
0.4047
2025.12
0.3964