Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Error Correction on 300 False Premises
Loading...
99
Isolated CR
Claude Sonnet 4.5
79.24
84.37
89.5
94.63
May 7, 2026
Isolated CR
Contextualized CR
Suppression Rate
#Suppressed
Updated 26d ago
Evaluation Results
Method
Method
Links
Isolated CR
Contextualized CR
Suppression Rate
#Suppressed
Claude Sonnet 4.5
2026.05
99
68.3
31
92
Qwen3.5-Plus
2026.05
98.7
79.7
19.3
57
Qwen3.5-9B
2026.05
97
52.3
46
134
GPT-5.1
2026.05
96.3
10
89.6
259
LLaMA3.1-8B
2026.05
96.3
50.7
47.4
137
DeepSeek-V3.2
2026.05
93.7
15
84
236
Grok 4.1 Fast
2026.05
92
17.3
81.2
224
Gemini 3 Flash
2026.05
80
13.7
82.9
199
Feedback
Search any
task
Search any
task