Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Reasoning on Reasoning Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)

76.55Average Accuracy

FP16

59.951664.260868.5772.8792Jun 15, 2024Oct 9, 2024Feb 2, 2025May 29, 2025Sep 22, 2025Jan 16, 2026May 12, 2026
Updated 20d ago

Evaluation Results

MethodLinks
2026.05
76.55-----
2026.05
76.33-----
2025.05
75.6285.5679.2777.9478.8456.49
2026.05
75-----
2026.05
74.4881.0179.374.4382.8754.78
2026.05
73.78-----
2026.05
72.67-----
2026.05
72.5779.7177.8672.378.8754.1
2024.06
72.5481.7764.8474.5184.2257.34
2026.05
72.5380.3678.1371.5179.7652.9
2026.05
72.38-----
2025.05
72.2479.1173.8375.7778.3254.18
2026.05
72.0980.2577.9671.0378.9152.3
2026.05
71.9178.8477.9871.8278.752.22
2025.06
71.7980.5279.3672.377.5349.23
2025.06
71.7980.5279.3672.377.5349.23
2026.02
71.7680.4779.3972.2277.4849.23
2025.05
71.7680.4779.3972.2277.4849.23
2026.05
71.71-----
2026.05
71.5377.5874.8667.6480.8956.66
2026.05
71.47-----
2026.05
70.6877.2673.1968.3579.4255.2
2026.05
70.1276.2273.5867.7278.4154.69
2026.05
69.1575.6873.2464.0979.2553.5
2026.02
6979.1175.9969.0674.5846.25
2026.02
6979.1175.9969.0674.5846.25
2025.05
6979.1175.9969.0674.5846.25
2026.05
68.96-----
2025.06
68.9579.1175.9968.8274.5846.25
2026.05
68.7575.6373.2766.5473.2755.03
2025.05
68.679.760.272.680.150.4
2024.06
68.5979.7160.1972.6180.0950.34
2025.05
68.4477.2572.2671.6371.4949.58
2025.05
68.3877.3672.8271.3370.7549.66
2026.05
68.3875.0868.4665.978.4554.01
2025.06
68.3777.5874.2970.2474.1645.56
2026.05
68.29-----
2025.06
68.2578.1874.1169.8572.7746.33
2025.06
68.0777.5373.9470.8873.2344.8
2025.06
67.9377.5873.4370.4873.1944.97
2025.06
67.8176.9374.7769.6170.1247.61
2025.06
67.576.7771.7170.5673.445.05
2026.02
67.576.3373.4471.1172.144.54
2026.05
67.17-----
2025.06
67.1676.571.1970.3273.3244.45
2026.05
66.8376.6671.5566.8371.0946.25
2026.02
66.7377.4872.1868.2772.9842.75
2026.05
66.51-----
2026.02
66.2474.9773.0168.7567.445.05
2025.05
66.1775.3870.1269.2568.5447.57
2026.05
6676.1770.767.8870.0345.22
2026.05
65.8674.3266.6863.1474.2450.94
2026.05
65.7973.6166.5463.7774.7950.26
2026.02
65.5977.0971.366.371.3641.89
2026.02
65.5276.2269.5968.1971.7141.89
2026.02
65.4879.2276.9266.4662.1642.66
2026.05
65.1976.1770.7967.0167.8544.11
2025.05
65.1875.0664.9669.1871.4345.26
2025.05
65.1174.5664.7969.3471.2545.63
2026.05
65.0973.0765.9461.0174.4950.94
2026.05
65.0275.6370.3865.1169.3644.62
2026.02
64.9977.268.0766.370.4542.92
2025.05
64.8674.6966.3667.5269.4346.31
2026.05
64.7475.6370.0365.4369.2843.34
2024.06
64.6978.0257.1768.4376.343.51
2025.05
64.6776.1754.9272.1476.1843.94
2025.06
64.6476.3970.266.368.0142.32
2026.02
64.5376.9372.266.365.8241.38
2025.05
64.3276.2854.371.2775.8443.9
2025.05
64.375.75472.275.843.8
2025.06
64.2873.9469.7267.468.3541.98
2025.05
64.2676.154.271.775.743.6
2025.06
64.174.7166.1267.1771.3841.04
2025.05
64.0973.6268.5568.7366.1643.39
2026.02
63.8374.5970.7467.464.4441.98
2026.05
63.8371.6566.1160.373.0648.04
2026.02
63.676.855.269.174.542.2
2026.05
63.36-----
2025.06
63.275.0366.4266.0670.0338.48
2025.05
63.1272.7467.7464.1265.8445.16
2026.02
63.0573.7266.866.6166.1241.98
2025.05
62.9673.2662.2568.3767.5443.38
2026.05
62.65-----
2026.02
62.672.7467.870.860.3541.3
2026.02
62.4972.7468.4566.5463.4341.3
2026.05
62.38-----
2025.06
62.1973.8363.9268.3565.6139.25
2026.02
62.1577.4867.1361.8865.7838.48
2025.06
61.8976.8269.8865.1961.8335.75
2025.05
61.7976.8269.8164.861.8735.67
2025.06
61.7376.4468.4164.9662.2536.6
2025.06
61.6773.5669.463.361.8340.27
2025.06
61.6574.3766.9167.462.8836.69
2026.02
61.3476.6666.1361.1764.8637.88
2025.05
61.1372.3559.4365.7865.3642.73
2025.06
61.176.3966.7860.763.4738.14
2025.05
60.8474.1568.8264.4761.5535.23
2025.05
60.7174.5868.4564.3761.2934.87
2025.06
60.6776.2268.2763.6158.7136.52
2025.06
60.5976.0667.1465.5959.0935.07
Showing 100 of 297 rows