Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn instruction following on MultiIF
Loading...
68.93
Normalized Score
Qwen3-Max-Thinking
24.5428
36.0664
47.59
59.1136
Mar 24, 2026
Normalized Score
Discriminability
Updated 24d ago
Evaluation Results
Method
Method
Links
Normalized Score
Discriminability
Qwen3-Max-Thinking
formatting=multi-turn,...
2026.03
68.93
43
DeepSeek-V3.2
formatting=multi-turn,...
2026.03
64.23
43
MiniMax-M2.5
formatting=multi-turn,...
2026.03
31.76
43
Kimi-K2.5
formatting=multi-turn,...
2026.03
26.94
43
GLM-5
formatting=multi-turn,...
2026.03
26.25
43
Feedback
Search any
task
Search any
task