Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention on AbstentionBench FreshQA
Loading...
75.2
Abstention F1
Claude Sonnet 4.5
31.104
42.552
54
65.448
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Claude Sonnet 4.5
Model Type=Proprietary...
2026.05
75.2
73.2
77.4
63.6
TIAR
Backbone=Llama-3.1-8B-...
2026.05
68.4
71.4
65.6
52.3
GPT-5.2
Model Type=Proprietary...
2026.05
48.8
35.7
76.9
70.5
Gemini 3
Model Type=Proprietary...
2026.05
32.8
19.6
100
81.8
Feedback
Search any
task
Search any
task