Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following on Multi-IF PT
Loading...
88
Accuracy
Gemini-3 Pro
79.264
81.532
83.8
86.068
Mar 10, 2026
Accuracy
Updated 2mo ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3 Pro
Model variant=high
2026.03
88
gpt-5.2
Model variant=high
2026.03
87.2
Gemini-3 Pro
Model variant=low
2026.03
86
kimi-k2
Model variant=thinking
2026.03
86
gpt-5-mini
Price Range=cost-effec...
2026.03
85.8
Qwen3
Model variant=235b
2026.03
84.4
gpt-5.2
Model variant=instant
2026.03
83.7
gpt-4.1
2026.03
82.7
sabia-4
2026.03
82
gpt-oss-120b
Price Range=cost-effec...
2026.03
82
deepseek
Model variant=v3.2
2026.03
81.5
sabiazinho-4
Price Range=cost-effec...
2026.03
81
gemini-2.5-flash-lite
Price Range=cost-effec...
2026.03
80.8
sabia-3.1
2026.03
80.7
gpt-4.1-mini
Price Range=cost-effec...
2026.03
79.6
Feedback
Search any
task
Search any
task