Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Reasoning on Reasoning Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)

85.56PIQA

Dense

70.781674.618378.45582.2917Jun 15, 2024Sep 23, 2024Jan 2, 2025Apr 13, 2025Jul 22, 2025Oct 31, 2025Feb 9, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2025.05
85.5679.2777.9478.8456.4975.62
2024.06
81.7764.8474.5184.2257.3472.54
2025.06
80.5279.3672.377.5349.2371.79
2025.06
80.5279.3672.377.5349.2371.79
2026.02
80.4779.3972.2277.4849.2371.76
2025.05
80.4779.3972.2277.4849.2371.76
2024.06
79.7160.1972.6180.0950.3468.59
2025.05
79.760.272.680.150.468.6
2026.02
79.2276.9266.4662.1642.6665.48
2025.06
79.1175.9968.8274.5846.2568.95
2026.02
79.1175.9969.0674.5846.2569
2026.02
79.1175.9969.0674.5846.2569
2025.05
79.1173.8375.7778.3254.1872.24
2025.05
79.1175.9969.0674.5846.2569
2025.06
78.1874.1169.8572.7746.3368.25
2024.06
78.0257.1768.4376.343.5164.69
2025.06
77.5874.2970.2474.1645.5668.37
2025.06
77.5873.4370.4873.1944.9767.93
2025.06
77.5373.9470.8873.2344.868.07
2026.02
77.4872.1868.2772.9842.7566.73
2026.02
77.4867.1361.8865.7838.4862.15
2025.05
77.3672.8271.3370.7549.6668.38
2025.05
77.2572.2671.6371.4949.5868.44
2026.02
77.268.0766.370.4542.9264.99
2026.02
77.0971.366.371.3641.8965.59
2025.06
76.9374.7769.6170.1247.6167.81
2026.02
76.9372.266.365.8241.3864.53
2025.06
76.8269.8865.1961.8335.7561.89
2025.05
76.8269.8164.861.8735.6761.79
2026.02
76.855.269.174.542.263.6
2025.06
76.7771.7170.5673.445.0567.5
2026.02
76.6666.1361.1764.8637.8861.34
2025.06
76.571.1970.3273.3244.4567.16
2024.06
76.4952.6964.4867.7636.7759.64
2025.06
76.4468.4164.9662.2536.661.73
2025.06
76.3966.7860.763.4738.1461.1
2025.06
76.3970.266.368.0142.3264.64
2026.02
76.3373.4471.1172.144.5467.5
2025.05
76.2854.371.2775.8443.964.32
2025.06
76.2268.2763.6158.7136.5260.67
2026.02
76.2269.5968.1971.7141.8965.52
2025.05
76.1754.9272.1476.1843.9464.67
2025.05
76.154.271.775.743.664.26
2025.06
76.0667.1465.5959.0935.0760.59
2024.06
76.0151.864.3367.9336.8659.39
2025.05
75.75472.275.843.864.3
2024.06
75.6849.9462.3564.936.657.89
2025.05
75.3870.1269.2568.5447.5766.17
2025.05
75.0664.9669.1871.4345.2665.18
2024.06
75.0349.6962.1967.3432.2557.3
2025.06
75.0366.4266.0670.0338.4863.2
2026.02
74.9773.0168.7567.445.0566.24
2025.05
74.8160.5861.0154.4231.1456.39
2025.06
74.7166.1267.1771.3841.0464.1
2024.06
74.764.2961.9657.4936.6959.03
2025.05
74.6966.3667.5269.4346.3164.86
2024.06
74.6446.9360.2266.1634.1356.42
2026.02
74.5970.7467.464.4441.9863.83
2025.05
74.5868.4564.3761.2934.8760.71
2025.05
74.5664.7969.3471.2545.6365.11
2024.06
74.4846.6263.6965.734.356.96
2026.02
74.448.363.16631.956.7
2025.06
74.3766.9167.462.8836.6961.65
2026.02
74.2165.8762.4361.0737.2960.17
2026.02
74.247.760.365.431.355.8
2025.05
74.1568.8264.4761.5535.2360.84
2024.06
74.146.6158.1764.3133.6255.36
2025.05
73.9662.1763.2461.4633.2858.82
2025.06
73.9469.7267.468.3541.9864.28
2026.02
73.946.462.770.234.657.6
2024.06
73.8850.0862.1967.0934.4757.54
2025.06
73.8363.9268.3565.6139.2562.19
2026.02
73.7266.866.6166.1241.9863.05
2025.06
73.6762.9963.361.9533.6259.1
2025.05
73.6268.5568.7366.1643.3964.09
2026.02
73.645.259.368.836.456.7
2025.05
73.5965.6864.5260.4134.9959.84
2025.06
73.5669.463.361.8340.2761.67
2025.05
73.2662.2568.3767.5443.3862.96
2026.02
73.0762.4758.9656.4833.0256.8
2025.05
72.9146.4569.1467.3436.0158.37
2025.05
72.8363.3264.0659.9834.6558.97
2024.06
72.7445.7555.7261.3631.0653.33
2026.02
72.7468.4566.5463.4341.362.49
2026.02
72.7467.870.860.3541.362.6
2025.05
72.7467.7464.1265.8445.1663.12
2025.05
72.4860.4567.5262.6939.7360.57
2026.02
72.4260.4459.2754.3434.5656.21
2024.06
72.3645.158.859.6430.0353.19
2025.05
72.3559.4365.7865.3642.7361.13
2025.05
72.245.66869.735.258.14
2025.05
72.147.768.570.33859.22
2025.05
72.0860.4267.9362.3839.8960.54
2025.05
72.0346.0368.5169.9136.1858.53
2024.06
71.8745.1759.5166.536.5255.91
2026.02
71.7159.3155.5653.733.2854.71
2024.06
71.5440.7155.462.1628.9251.75
2026.02
71.542.959.361.428.352.7
2025.05
71.4256.0567.1352.1835.7956.51
2025.05
71.3555.8467.6252.2635.7556.56
Showing 100 of 177 rows