Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning Episode Classification on Omni-MATH Non-Reasoning episodes (human-annotated gold set)

89.34Accuracy

GPT-4.1

84.233685.559386.88588.2107Dec 23, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
89.3485.36
2025.12
89.3485.35
2025.12
87.1682.35
2025.12
84.4378.62