Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Factuality Evaluation on HotpotQA

0.686Average Score

RLFH

0.530.57050.6110.6515Jun 18, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.06
0.6866.232.11000.714
2024.06
0.6687.33.66900.651
2024.06
0.6532.491.311000.635
2024.06
0.6464.481.91990.649
2024.06
0.6454.92.18990.652
2024.06
0.6394.572.44990.652
2024.06
0.6389.134.8930.634
2024.06
0.61815.49.22960.642
2024.06
0.5935.143.06900.608
2024.06
0.5917.363.81960.633
2024.06
0.5466.616900.524
2024.06
0.53612.7121000.533