Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Coverage Analysis on 420-item set v4 (full)
Loading...
100
Pred%
glm-5
87.104
90.452
93.8
97.148
Apr 20, 2026
Pred%
Jud%
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pred%
Jud%
glm-5
2026.04
100
100
claude-opus-4.5
2026.04
100
97.2
Doubao-Seed-1.8
2026.04
100
86.3
glm-4.7
2026.04
100
88.4
claude-sonnet-4.5
2026.04
100
86.4
claude-haiku-4.5
2026.04
100
94.9
gemini-3-flash
2026.04
100
97.7
MiniMax-M2.1
2026.04
97.9
90.3
MiniMax-M2.5
2026.04
97.1
97.8
gpt-5.1
2026.04
95.7
81.4
gpt-5.2
2026.04
95
95.5
deepseek-v3.1
2026.04
87.6
80.7
Feedback
Search any
task
Search any
task