Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention in Question Answering on QAQA False Premise
Loading...
79.9
Abstention F1
R-Tuning
55.044
61.497
67.95
74.403
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
R-Tuning
Model=Qwen3-8B
2026.05
79.9
90.2
71.8
82.5
TIAR
Model=Qwen3-8B
2026.05
79.7
89.5
71.8
81.8
TruthRL
Model=Qwen3-8B
2026.05
79.1
87.7
72
84.6
DPO
Model=Qwen3-8B
2026.05
78
88.8
69.5
85.3
RFT
Model=Qwen3-8B
2026.05
70.3
70.5
70
67.7
TIAR
Model=Llama-3.1-8B-Ins...
2026.05
58.2
46.7
77.3
48.4
DPO
Model=Llama-3.1-8B-Ins...
2026.05
58
47
75.7
46.7
TruthRL
Model=Llama-3.1-8B-Ins...
2026.05
58
46.7
76.4
50.5
RFT
Model=Llama-3.1-8B-Ins...
2026.05
57.1
45.6
76.5
44.9
R-Tuning
Model=Llama-3.1-8B-Ins...
2026.05
56
45.6
72.6
44.2
Feedback
Search any
task
Search any
task