Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-Shot Common Sense Reasoning and QA on (ARC-e, ARC-c, HellaSwag, OBQA, WinoGrande, MathQA, PIQA)

83.54ARC-e

Dense

32.94446.079559.21572.3505Dec 15, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.12
83.5455.9757.133167.849.6176.8860.28-
2025.12
78.9649.4952.4729.669.3840.3774.6556.423.86
2025.12
76.343.3457.1431.469.1428.1778.0754.79-
2025.12
74.2842.4146.5728.865.4332.1672.3151.718.57
2025.12
73.9941.7242.0226.463.331.4272.0950.1310.14
2025.12
73.2340.0251.93168.4327.7175.8452.592.2
2025.12
71.3844.5447.2928.866.4630.9972.1451.668.62
2025.12
68.3135.6745.9526.265.9824.6672.4248.466.34
2025.12
67.834.5648.2529.663.7725.4972.248.815.98
2025.12
59.1827.8238.2224.266.5423.8966.5443.7711.02
2025.12
58.6727.6543.126.264.2523.8570.1844.849.95
2025.12
58.0833.2840.1823.661.7225.3368.2344.3515.93
2025.12
54.2523.2938.822.858.5622.7565.3440.8313.97
2025.12
52.924.0638.012261.7222.8867.1941.2513.54
2025.12
51.2222.9534.2519.659.4323.8261.9239.0315.77
2025.12
47.9823.8930.5717.650.9923.0861.6436.5423.74
2025.12
34.8923.1229.8913.451.6222.0856.8633.1221.67