Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Factuality Detection on Short-form QA (Average of NQ, PopQA, TriviaQA, SimpleQA) (test)

71.1PR-AUC

FRANQ condition-calibrated

35.01244.38153.7563.119May 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
71.147.7
2025.05
70.546.2
2025.05
70.246.4
2025.05
7046
2025.05
67.241.1
2025.05
6736.8
2025.05
66.637.2
2025.05
64.754
2025.05
64.453.4
2025.05
64.130.4
2025.05
64.134.5
2025.05
6449.1
2025.05
63.953.2
2025.05
63.751.9
2025.05
63.450.3
2025.05
63.154.1
2025.05
62.952
2025.05
62.825.6
2025.05
62.851.8
2025.05
62.848.9
2025.05
62.853.7
2025.05
62.749.2
2025.05
62.327.8
2025.05
61.827.7
2025.05
61.324.2
2025.05
61.352.5
2025.05
60.226.3
2025.05
59.448.1
2025.05
59.449.4
2025.05
57.148.3
2025.05
56.940.7
2025.05
56.447.9
2025.05
55.845.4
2025.05
55.641.4
2025.05
55.610.4
2025.05
55.341.7
2025.05
55.340.3
2025.05
55.144.3
2025.05
52.640.9
2025.05
52.438.5
2025.05
52.334
2025.05
49.933
2025.05
49.628.3
2025.05
48.125.8
2025.05
47.430.1
2025.05
46.726
2025.05
46.626.1
2025.05
46.426
2025.05
44.722.5
2025.05
43.222.4
2025.05
4324
2025.05
42.524.7
2025.05
42.323
2025.05
41.617.4
2025.05
41.520.7
2025.05
41.419.6
2025.05
41.219.8
2025.05
4016.2
2025.05
37.615.8
2025.05
36.410.5