Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning on PIQA, WinoGrande, HellaSwag, BoolQ, SocialIQA, and OpenBookQA

90.2PIQA Accuracy

Qwen2.5-14B

78.03281.19184.3587.509Dec 26, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.12
90.285.892.971.278.991.485.1
2025.12
89.586.795.171.679.191.685.6
2025.12
86.783.989.468.777.687.482.3
2025.12
85.981.891.770.48086.282.3
2025.12
85.98593.871.380.48483.7
2025.12
83.381.890.669.575.87980
2025.12
78.577.783.962.474.376.875.6