Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstention in Question Answering on KUQ Cont. Subjective
Loading...
87.2
Abstention F1
TIAR
71.808
75.804
79.8
83.796
May 25, 2026
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Abstention F1
Abstention Recall
Abstention Precision
Accuracy
TIAR
Model=Qwen3-8B
2026.05
87.2
90.8
83.8
95.7
DPO
Model=Qwen3-8B
2026.05
87.1
93.2
81.8
96.4
R-Tuning
Model=Qwen3-8B
2026.05
87
91.7
82.8
95.7
TruthRL
Model=Qwen3-8B
2026.05
86.7
91.7
82.2
96.7
RFT
Model=Llama-3.1-8B-Ins...
2026.05
77.7
70.9
86
67.8
TIAR
Model=Llama-3.1-8B-Ins...
2026.05
76.2
69.7
83.9
70.7
R-Tuning
Model=Llama-3.1-8B-Ins...
2026.05
75
67.7
84.1
66.3
TruthRL
Model=Llama-3.1-8B-Ins...
2026.05
74.8
66.5
85.5
70.7
DPO
Model=Llama-3.1-8B-Ins...
2026.05
74.6
68.2
82.1
66.7
RFT
Model=Qwen3-8B
2026.05
72.4
68.2
77.2
88
Feedback
Search any
task
Search any
task