Share your thoughts, 1 month free Claude Pro on usSee more

Instruction Following on Instruction Following tasks

78.3Score

GPT-5

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 2026.01		78.3	7
DeepSeek-V3.2-Thinking 2026.01		74.7	14.4
MiniMax-M2 2026.01		73	-
Claude-Sonnet-4.5-Think 2026.01		72	12.7
GPT-5 2026.01		71.3	-
Kimi-K2-Thinking 2026.01		68.7	3.7
Kimi-K2-Thinking 2026.01		65	-
MiniMax-M2 2026.01		61.3	-11.7
DeepSeek-V3.2-Thinking 2026.01		60.3	-
Claude-Sonnet-4.5-Think 2026.01		59.3	-
Qwen3-Coder-30B-A3B 2026.01		40.3	5.6
Qwen3-Coder-30B-A3B 2026.01		34.7	-
Qwen3-4B-Instruct-2507 2026.01		33.7	-
Qwen3-4B-Instruct-2507 2026.01		29	-4.7