Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Judge Evaluation on Magis-Bench
Loading...
7.79
Score
Gemini-3 Pro
3.4532
4.5791
5.705
6.8309
Mar 10, 2026
Score
Updated 2mo ago
Evaluation Results
Method
Method
Links
Score
Gemini-3 Pro
Model variant=low
2026.03
7.79
Gemini-3 Pro
Model variant=high
2026.03
7.48
gpt-5.2
Model variant=high
2026.03
6.99
gpt-5.2
Model variant=instant
2026.03
6.66
gpt-4.1
2026.03
5.55
sabia-4
2026.03
5.08
sabia-3.1
2026.03
4.97
deepseek
Model variant=v3.2
2026.03
4.88
Qwen3
Model variant=235b
2026.03
4.52
sabiazinho-4
Price Range=cost-effec...
2026.03
4.5
kimi-k2
Model variant=thinking
2026.03
4.49
gpt-5-mini
Price Range=cost-effec...
2026.03
4.47
gemini-2.5-flash-lite
Price Range=cost-effec...
2026.03
4.25
gpt-4.1-mini
Price Range=cost-effec...
2026.03
3.67
gpt-oss-120b
Price Range=cost-effec...
2026.03
3.62
Feedback
Search any
task
Search any
task