Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning Benchmark

75.91BoolQ Accuracy

StelLA

70.91872.21473.5174.806Oct 2, 2025Oct 6, 2025Oct 10, 2025Oct 15, 2025Oct 19, 2025Oct 23, 2025Oct 28, 2025
Updated 9d ago

Evaluation Results

MethodLinks
2025.10
75.9189.8681.6896.4187.8291.9882.3487.886.72
2025.10
75.3888.0179.9495.3586.2990.5479.6986.0785.16
2025.10
75.2488.5780.2195.8185.1191.0980.5586.685.4
2025.10
75.1688.1480.1895.4186.7490.8478.78785.27
2025.10
74.8888.4380.3195.586.269079.8685.885.13
2025.10
74.6788.1280.594.9885.2290.1578.8785.684.76
2025.10
74.4187.6879.5594.7985.490.0478.248584.39
2025.10
73.6284.8780.6491.4484.586.4372.8484.3382.33
2025.10
73.285.881.894.98674.788.78683.9
2025.10
73.28682.495.287.175.788.786.484.3
2025.10
73.185.468.578.566.189.879.974.877
2025.10
73.185.381.895.186.375.388.686.884
2025.10
73.0986.6478.6493.482.8887.7675.2684.382.74
2025.10
72.885.381.995.285.674.988.886.483.9
2025.10
72.6783.4879.8290.8283.5885.1671.2781.281
2025.10
72.585.281.994.286.773.787.18783.5
2025.10
72.283.8679.6790.882.4385.5570.5981.9380.88
2025.10
72.0283.4679.8790.4482.6984.8371.1981.5380.76
2025.10
71.5483.8479.690.583.1984.469.9680.4780.44
2025.10
71.2380.9678.3380.9177.5981.7666.6979.877.16
2025.10
71.1683.8979.199182.8785.0969.4883.9380.83
2025.10
71.1182.778.6489.4181.4883.5868.1780.279.41