Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on Commonsense Reasoning Benchmark Intra-domain Multi-task

87.27Average Accuracy

ST Baseline

83.494884.474985.45586.4351Sep 29, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
87.27-
2025.09
86.470.91
2025.09
86.341.06
2025.09
86.31.09
2025.09
86.141.3
2025.09
86.091.32
2025.09
85.961.43
2025.09
84.9-
2025.09
84.810.04
2025.09
84.570.32
2025.09
84.230.76
2025.09
83.881.17
2025.09
83.731.36
2025.09
83.641.48