Share your thoughts, 1 month free Claude Pro on usSee more

Long-form QA Factuality Detection on Long-form QA Benchmark

17.3PR-AUC

FRANQ condition-calibrated

Updated 3mo ago

Evaluation Results

Method
FRANQ condition-calibrated 2025.05	17.3	0.354
CCP 2025.05	16.2	0.181
FRANQ calibrated 2025.05	15	0.401
FRANQ condition-calibrated 2025.05	14	0.223
FRANQ no calibration 2025.05	13.5	0.362
Max Token Entropy 2025.05	13	0.219
Max Claim Prob. 2025.05	12.6	0.258
XGBoost (all UQ features) 2025.05	12.4	0.206
P(True) 2025.05	11.7	0.207
Parametric Knowledge 2025.05	11.2	0.183
XGBoost (FRANQ features) 2025.05	11.1	0.149
Max Token Entropy 2025.05	10.9	0.115
AlignScore 2025.05	10.4	0.233
FRANQ calibrated 2025.05	10.3	0.256
Max Token Entropy 2025.05	10.2	0.138
FRANQ no calibration 2025.05	10	0.181
P(True) 2025.05	9.6	0.148
Perplexity 2025.05	9	0.165
XGBoost (FRANQ features) 2025.05	9	0.158
FRANQ condition-calibrated 2025.05	9	0.208
XGBoost (all UQ features) 2025.05	8.8	0.198
CCP 2025.05	8.7	0.216
CCP 2025.05	8.5	0.169
FRANQ condition-calibrated 2025.05	8.1	0.184
XGBoost (FRANQ features) 2025.05	8	0.086
FRANQ no calibration 2025.05	8	0.2
P(True) 2025.05	7.7	0.17
Perplexity 2025.05	7.5	0.09
AlignScore 2025.05	7.5	0.108
FRANQ calibrated 2025.05	7.4	0.09
XGBoost (all UQ features) 2025.05	7.3	0.085
P(True) 2025.05	7.1	0.112
AlignScore 2025.05	6.8	0.119
Parametric Knowledge 2025.05	6.7	0.029
Parametric Knowledge 2025.05	6.4	0.018
FRANQ no calibration 2025.05	6.3	0.162
Max Claim Prob. 2025.05	6.1	0
CCP 2025.05	6.1	0.108
AlignScore 2025.05	6.1	0.058
Parametric Knowledge 2025.05	5.9	0.047
Max Claim Prob. 2025.05	5.8	-0.029
Perplexity 2025.05	5.6	-0.081
Max Claim Prob. 2025.05	5.5	0.118
Max Token Entropy 2025.05	5.1	-0.003
Perplexity 2025.05	4.8	-0.071
XGBoost (FRANQ features) 2025.05	4.8	0.017
XGBoost (all UQ features) 2025.05	4.4	-
FRANQ calibrated 2025.05	4.3	-0.047