Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Outcome Reasoning on MalAlgoQA

85.1M' (F1 Mean)

GPT-5

56.29263.77171.2578.729May 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
85.179.6
2025.05
83.677.8
2025.05
7468.2
2025.05
71.865.1
2025.05
69.662.9
2025.05
67.560.9
2025.05
57.450.7