Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention in Question Answering on UMWP Underspecified Context
Loading...
94
Abstention F1
TIAR
74.552
79.601
84.65
89.699
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
TIAR
Model=Qwen3-8B
2026.05
94
97.6
90.7
98.5
TruthRL
Model=Qwen3-8B
2026.05
92.9
97.5
88.8
98.3
DPO
Model=Qwen3-8B
2026.05
92.5
97.6
87.9
98.5
R-Tuning
Model=Qwen3-8B
2026.05
92.4
97.8
87.6
97.9
RFT
Model=Qwen3-8B
2026.05
91.9
92.9
90.9
95.9
TruthRL
Model=Llama-3.1-8B-Ins...
2026.05
78.9
65.5
99.2
90.3
TIAR
Model=Llama-3.1-8B-Ins...
2026.05
77.6
63.8
99.1
91.7
RFT
Model=Llama-3.1-8B-Ins...
2026.05
76.1
61.7
99.2
86.7
DPO
Model=Llama-3.1-8B-Ins...
2026.05
75.5
61.1
98.7
89.7
R-Tuning
Model=Llama-3.1-8B-Ins...
2026.05
75.3
60.7
99.1
86.2
Feedback
Search any
task
Search any
task