Share your thoughts, 1 month free Claude Pro on usSee more

Multi-turn instruction following on MultiIF

68.93Normalized Score

Qwen3-Max-Thinking

Updated 4mo ago

Evaluation Results

Method	Links
Qwen3-Max-Thinking 2026.03		68.93	43
DeepSeek-V3.2 2026.03		64.23	43
MiniMax-M2.5 2026.03		31.76	43
Kimi-K2.5 2026.03		26.94	43
GLM-5 2026.03		26.25	43