Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention in Question Answering on FreshQA Stale
Loading...
81.2
Abstention F1
R-Tuning
69.448
72.499
75.55
78.601
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
R-Tuning
Model=Qwen3-8B
2026.05
81.2
84.2
78.4
89
TruthRL
Model=Qwen3-8B
2026.05
81.1
82.5
79.8
90
TIAR
Model=Qwen3-8B
2026.05
80.4
81.4
79.6
88
DPO
Model=Qwen3-8B
2026.05
79.6
81.4
77.8
84
TruthRL
Model=Llama-3.1-8B-Ins...
2026.05
74.7
75.7
73.6
54
TIAR
Model=Llama-3.1-8B-Ins...
2026.05
74.5
75.1
73.9
55
RFT
Model=Llama-3.1-8B-Ins...
2026.05
72.2
66.7
78.7
50
DPO
Model=Llama-3.1-8B-Ins...
2026.05
72.1
70.1
74.3
48
RFT
Model=Qwen3-8B
2026.05
71.9
67.2
77.3
73
R-Tuning
Model=Llama-3.1-8B-Ins...
2026.05
69.9
66.1
74.1
40
Feedback
Search any
task
Search any
task