Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning on BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA (test)

73.54BoolQ Accuracy

Flexora

57.357661.558865.7669.9612Aug 20, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.08
73.5471.9385.2874.1171.2245.6439.8665.94
2024.08
67.7669.876.167.0167.2135.2338.660.24
2024.08
63.472.1549.8356.449.4534.3135.8651.63
2024.08
57.9860.9434.3552.2531.8227.335.842.92