Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MINT-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-augmented reasoningMINT-Bench
Success Rate (Turn 1)9.85
5
multi-turn interaction-based problem solvingMINT-Bench 1.0 (test)
Code Generation Score11.76
5
Showing 2 of 2 rows