Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot Reasoning on (OpenbookQA, ARC-e, ARC-c, WinoGrande, HellaSwag, PIQA, MathQA)

54.92Average Accuracy

LittleBit

-1.156813.401627.9642.5184Feb 3, 2026Feb 4, 2026Feb 5, 2026Feb 6, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
54.92--------
2026.02
54.24--------
2026.02
52.3--------
2026.02
52286738675678270
2026.02
51.01--------
2026.02
50.63--------
2026.02
5029683665457525390
2026.02
47.32--------
2026.02
4727633364457124920
2026.02
46265931664470231,080
2026.02
44225829634369241,470
2026.02
43255327644168241,670
2026.02
41235029623665232,110
2026.02
40225030613664222,280
2026.02
38224127583461232,630
2026.02
37194225583360212,830
2026.02
34163325523054233,410
2026.02
32153120522854223,800
2026.02
31142822502755213,990
2026.02
30132822482655194,180
2026.02
29122621492653184,380
2026.02
29116581059,600
2026.02
265220539,660
2026.02
161001109,780