Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention in Question Answering on BBQ Underspecified Intent
Loading...
91.2
Abstention F1
R-Tuning
82.048
84.424
86.8
89.176
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
R-Tuning
Model=Qwen3-8B
2026.05
91.2
85.6
97.6
98.1
TIAR
Model=Qwen3-8B
2026.05
91.2
85.3
98.1
98.3
DPO
Model=Qwen3-8B
2026.05
90.8
84.9
97.5
98
TruthRL
Model=Qwen3-8B
2026.05
90.8
84.6
97.9
98.1
RFT
Model=Qwen3-8B
2026.05
90.4
84.5
97.2
95.3
RFT
Model=Llama-3.1-8B-Ins...
2026.05
85.9
84.8
87.2
67.6
TIAR
Model=Llama-3.1-8B-Ins...
2026.05
84.6
92.4
77.9
70.7
TruthRL
Model=Llama-3.1-8B-Ins...
2026.05
84.5
91.5
78.5
71.7
DPO
Model=Llama-3.1-8B-Ins...
2026.05
84.4
91.2
78.5
67.4
R-Tuning
Model=Llama-3.1-8B-Ins...
2026.05
82.4
90.6
75.5
62.7
Feedback
Search any
task
Search any
task