Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Track A Evaluation on 420-item set v4 (full)
Loading...
51.9
Exact Match Accuracy
gpt-5.2
-2.076
11.937
25.95
39.963
Apr 20, 2026
Exact Match Accuracy
UC Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
UC Accuracy
gpt-5.2
2026.04
51.9
87.6
gpt-5.1
2026.04
48.1
85.2
glm-5
2026.04
47.1
86.2
deepseek-v3.1
2026.04
43.1
81.7
claude-opus-4.5
2026.04
41.9
85
Doubao-Seed-1.8
2026.04
38.6
75.7
glm-4.7
2026.04
37.6
76.4
claude-sonnet-4.5
2026.04
33.6
72.1
claude-haiku-4.5
2026.04
24.3
66.2
gemini-3-flash
2026.04
9.8
52.6
MiniMax-M2.5
2026.04
0
24.3
MiniMax-M2.1
2026.04
0
18.8
Feedback
Search any
task
Search any
task