Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Common-sense Reasoning Suite (ARC-e, ARC-c, BoolQ, PIQA, SIQA, HellaSwag, OBQA, WinoGrande)

74.2ARC-e Accuracy

StatQAT-iterative

42.4850.71558.9567.185May 18, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2026.05
74.242.377.975.443.754.529.268.458.2
2026.05
69.738.769.97341.847.126.461.653.5
2026.05
58.927.162.169.538.541.221.859.147.3
2026.05
58.628.364.270.539.642.423.457.948.1
2026.05
57.127.662.269.438.74021.456.446.6
2026.05
4620.357.864.738.232.71852.541.3
2026.05
44.620.154.361.337.831.71853.140.1
2026.05
43.719.155.161.637.431.816.25239.6