Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Track C Evaluation on 420-item set v4 (full)
Loading...
87.85
KPI
gpt-5.2
9.85
30.1
50.35
70.6
Apr 20, 2026
KPI
Tier Metric
Updated 1mo ago
Evaluation Results
Method
Method
Links
KPI
Tier Metric
gpt-5.2
2026.04
87.85
-
gpt-5.1
2026.04
82.18
-
glm-5
2026.04
79.82
-
deepseek-v3.1
2026.04
79.22
-
claude-opus-4.5
2026.04
77.66
-
Doubao-Seed-1.8
2026.04
75.69
-
glm-4.7
2026.04
75.21
-
claude-sonnet-4.5
2026.04
71.75
-
claude-haiku-4.5
2026.04
62.76
-
gemini-3-flash
2026.04
42
-
MiniMax-M2.5
2026.04
12.98
-
MiniMax-M2.1
2026.04
12.85
-
Feedback
Search any
task
Search any
task