Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MINT-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-augmented reasoningMINT-Bench
Success Rate (Turn 1)9.85
5
multi-turn interaction-based problem solvingMINT-Bench 1.0 (test)
Code Generation Score11.76
5
Showing 2 of 2 rows