Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Following on Instruction Following tasks
Loading...
78.3
Score
GPT-5
27.028
40.339
53.65
66.961
Jan 22, 2026
Score
Performance Difference (Delta)
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Performance Difference (Delta)
GPT-5
Generation Mode=LLM-in...
2026.01
78.3
7
DeepSeek-V3.2-Thinking
Generation Mode=LLM-in...
2026.01
74.7
14.4
MiniMax-M2
Generation Mode=Standa...
2026.01
73
-
Claude-Sonnet-4.5-Think
Generation Mode=LLM-in...
2026.01
72
12.7
GPT-5
Generation Mode=Standa...
2026.01
71.3
-
Kimi-K2-Thinking
Generation Mode=LLM-in...
2026.01
68.7
3.7
Kimi-K2-Thinking
Generation Mode=Standa...
2026.01
65
-
MiniMax-M2
Generation Mode=LLM-in...
2026.01
61.3
-11.7
DeepSeek-V3.2-Thinking
Generation Mode=Standa...
2026.01
60.3
-
Claude-Sonnet-4.5-Think
Generation Mode=Standa...
2026.01
59.3
-
Qwen3-Coder-30B-A3B
Generation Mode=LLM-in...
2026.01
40.3
5.6
Qwen3-Coder-30B-A3B
Generation Mode=Standa...
2026.01
34.7
-
Qwen3-4B-Instruct-2507
Generation Mode=Standa...
2026.01
33.7
-
Qwen3-4B-Instruct-2507
Generation Mode=LLM-in...
2026.01
29
-4.7
Feedback
Search any
task
Search any
task