Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Commonsense Reasoning Suite (PIQA, ARC, HS, WG, BoolQ, MMLU)

82.1PIQA Accuracy (Zero-shot)

W16A16 Baseline

53.468860.901968.33575.7681Mar 26, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.03
82.179.5958.8782.9575.6185.2677.5877.42
2026.03
80.4177.449.1579.3772.1480.5552.7770.26
2026.03
71.6562.8837.866.0260.7771.2227.0456.77
2026.03
71.1666.9640.9663.3862.5164.2543.558.96
2026.03
7161.5735.8465.1557.9366.3924.7854.67
2026.03
69.2157.3734.5661.9556.9165.4423.5652.71
2026.03
67.5260.8636.1860.1258.0960.1531.6153.5
2026.03
61.144.8727.4741.0350.6758.521.1443.54
2026.03
59.4548.5630.1661.4254.2358.3427.6848.55
2026.03
54.5735.1424.6635.2951.4656.4524.940.35