Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Noise Sensitivity Analysis on 420-item set v4 vs v2 (full)
Loading...
6.58
Delta Performance (v4 vs v2)
gpt-5.1
1.38
2.73
4.08
5.43
Apr 20, 2026
Delta Performance (v4 vs v2)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Delta Performance (v4 vs v2)
gpt-5.1
2026.04
6.58
glm-4.7
2026.04
5.22
gemini-3-flash
2026.04
4.35
gpt-5.2
2026.04
3.65
claude-opus-4.5
2026.04
3.2
glm-5
2026.04
1.58
Feedback
Search any
task
Search any
task