Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-form generation factuality and uncertainty estimation on WildHallu (test)

0.86Factuality Score

LOGU-DPO

0.70920.748350.78750.82665Oct 18, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.10
0.8652.82.68
2024.10
0.84461.83.49
2024.10
0.8448.12.33
2024.10
0.80248.63.44
2024.10
0.79251.15.73
2024.10
0.78944.65.37
2024.10
0.78745.83.29
2024.10
0.7728.75.13
2024.10
0.75448.65.2
2024.10
0.744-6.24
2024.10
0.73451.69.8
2024.10
0.72657.310.1
2024.10
0.72450.68.29
2024.10
0.715-8.31