Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Physical Reasoning on PIQA (Accuracy & Speedup)

91.3Accuracy

Mistral Small 24B Base 2501

57.510466.282775.05583.8273Dec 2, 2025Dec 11, 2025Dec 21, 2025Dec 31, 2025Jan 10, 2026Jan 20, 2026Jan 30, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
91.3-8.2
2026.01
91.3-8.1
2026.01
91-9.7
2026.01
90.9-9.7
2026.01
89.5-7.3
2026.01
89.2-7
2026.01
88.8-7.6
2026.01
88.8-3.8
2026.01
88.6-7.4
2026.01
88.4-11.7
2026.01
88.4-11.7
2026.01
88.3-3.3
2026.01
86.6-8.8
2026.01
86.4-8.6
2026.01
86-4.3
2026.01
85.4-3.7
2026.01
83.4-0.7
2026.01
82.2-2
2026.01
82-7.8
2026.01
81.5--1.1
2026.01
80.6-6.4
2026.01
80.1--0.2
2026.01
79.3-2.5
2026.01
74--2.8
2025.12
73.251.11-
2025.12
73.141.14-
2025.12
73.111.21-
2025.12
73.091.1-
2025.12
73.081.11-
2025.12
73.061.14-
2025.12
73.021-
2025.12
72.781.55-
2025.12
72.331.01-
2025.12
58.812.5-