Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning Episode Classification on Omni-MATH human-annotated Reasoning episodes (gold set)

86.33Accuracy

GPT-5

80.29881.86483.4384.996Dec 23, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
86.3382.85
2025.12
86.182.74
2025.12
86.0282.54
2025.12
85.7582.39
2025.12
82.978.67
2025.12
82.4578.21
2025.12
80.8975.96
2025.12
80.5375.6