Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention in Question Answering on BB Answer Unknown
Loading...
97.9
Abstention F1
RFT
78.556
83.578
88.6
93.622
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
RFT
Model=Llama-3.1-8B-Ins...
2026.05
97.9
100
95.8
87
TIAR
Model=Llama-3.1-8B-Ins...
2026.05
97.9
100
95.8
87
TruthRL
Model=Llama-3.1-8B-Ins...
2026.05
95.8
100
92
87
DPO
Model=Llama-3.1-8B-Ins...
2026.05
93.9
100
88.5
69.6
R-Tuning
Model=Llama-3.1-8B-Ins...
2026.05
91.7
95.7
88
91.3
RFT
Model=Qwen3-8B
2026.05
89.8
95.7
84.6
78.3
TruthRL
Model=Qwen3-8B
2026.05
85.2
100
74.2
100
DPO
Model=Qwen3-8B
2026.05
83.6
100
71.9
95.7
R-Tuning
Model=Qwen3-8B
2026.05
82.1
100
69.7
100
TIAR
Model=Qwen3-8B
2026.05
79.3
100
65.7
100
Feedback
Search any
task
Search any
task