Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on PIQA, WinoG., HellaS., BoolQ, SIQA, OBQA (test)

89.9PIQA Accuracy

LLMBOOST

75.13278.96682.886.634Dec 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
89.988.895.474.481.291.486.9
2025.12
89.586.795.171.679.191.685.6
2025.12
87.986.194.571.781.685.684.4
2025.12
87.783.290.371.880.485.483.1
2025.12
87.582.291.370.279.685.882.7
2025.12
87.486.395.272.580.784.884.5
2025.12
87.184.390.270.879.389.283.5
2025.12
86.984.993.570.579.284.883.3
2025.12
86.783.989.468.777.687.482.3
2025.12
86.384.591.370.78088.883.6
2025.12
86.184.191.668.679.188.283
2025.12
86.183.391.570.378.486.882.7
2025.12
86.18291.770.48087.683
2025.12
85.981.891.469.77986.282.3
2025.12
85.38592.270.979.98583.1
2025.12
84.880.885.166.277.38680
2025.12
84.781.490.467.17878.880.1
2025.12
84.580.180.668.177.784.679.3
2025.12
84.283.291.469.876.978.481
2025.12
84.28291.570.27979.681.1
2025.12
84.180.683.567.3788379.4
2025.12
84.179.787.667.877.483.880.4
2025.12
83.980.282.668.978.486.480.1
2025.12
83.48086.368.777.98480.1
2025.12
83.381.890.669.575.87980
2025.12
82.88088.766.576.777.478.7
2025.12
82.881.389.967.178.478.279.6
2025.12
81.781.386.867.376.184.879.7
2025.12
80.576.485.166.777.577.277.2
2025.12
79.476.984.766.275.475.876.4
2025.12
78.577.783.962.474.376.875.6
2025.12
75.775.983.464.673.877.275.1