Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot Reasoning on ARC-e, HellaSwag, PIQA, and Winogrande

77.2Normalized Avg Accuracy

BF16

29.98442.24254.566.758May 16, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
77.2-----
2025.05
75.4-----
2025.05
72.4-----
2025.05
68.9-----
2025.05
61.4-----
2025.05
51.1-----
2025.05
48-----
2025.05
47.3-----
2025.05
47-----
2025.05
46-----
2025.05
45.2-----
2025.05
44.7-----
2025.05
43.3-----
2025.05
42.5-----
2025.05
42.5-----
2025.05
41.5-----
2025.05
41.4-----
2025.05
40.4-----
2025.05
39.4-----
2025.05
38.7-----
2025.05
37.4-----
2025.05
36.7-----
2025.05
35.9-----
2025.05
35-----
2025.05
34-----
2025.05
33-----
2025.05
32.9-----
2025.05
32.8-----
2025.05
32.8-----
2025.05
32.4-----
2025.05
32.4-----
2025.05
32.3-----
2025.05
32.1-----
2025.05
32-----
2025.05
31.9-----
2025.05
31.8-----
2026.02
-74.6269.227679.1168.85
2026.02
-26.3948.6225.6452.9938.41
2026.02
-26.0550.225.752.3938.59
2026.02
-26.8149.4925.8353.8138.99
2026.02
-36.9956.0430.4956.9645.12
2026.02
-41.1258.1731.7558.4947.38
2026.02
-43.2757.7732.1458.9248.03
2026.02
-25.847.3625.5552.6737.85
2026.02
-25.8448.725.6452.8338.25
2026.02
-25.1349.1725.4852.9438.18
2026.02
-31.6551.1428.3854.5741.44
2026.02
-34.1854.0628.8855.543.16
2026.02
-35.5655.829.5856.5844.38
2026.02
-24.4951.8525.453.1638.73
2026.02
-25.3848.4625.6151.9637.85
2026.02
-26.6848.8625.7651.838.28
2026.02
-26.7347.4326.8953.4838.63
2026.02
-28.251.2227.3652.8339.9
2026.02
-31.0252.0130.3854.6242.01