Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-form QA Factuality Detection on Long-form QA Benchmark

17.3PR-AUC

FRANQ condition-calibrated

3.787.2910.814.31May 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
17.30.354
2025.05
16.20.181
2025.05
150.401
2025.05
140.223
2025.05
13.50.362
2025.05
130.219
2025.05
12.60.258
2025.05
12.40.206
2025.05
11.70.207
2025.05
11.20.183
2025.05
11.10.149
2025.05
10.90.115
2025.05
10.40.233
2025.05
10.30.256
2025.05
10.20.138
2025.05
100.181
2025.05
9.60.148
2025.05
90.165
2025.05
90.158
2025.05
90.208
2025.05
8.80.198
2025.05
8.70.216
2025.05
8.50.169
2025.05
8.10.184
2025.05
80.086
2025.05
80.2
2025.05
7.70.17
2025.05
7.50.09
2025.05
7.50.108
2025.05
7.40.09
2025.05
7.30.085
2025.05
7.10.112
2025.05
6.80.119
2025.05
6.70.029
2025.05
6.40.018
2025.05
6.30.162
2025.05
6.10
2025.05
6.10.108
2025.05
6.10.058
2025.05
5.90.047
2025.05
5.8-0.029
2025.05
5.6-0.081
2025.05
5.50.118
2025.05
5.1-0.003
2025.05
4.8-0.071
2025.05
4.80.017
2025.05
4.4-
2025.05
4.3-0.047