Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-hop Reasoning on 2WikiMHQA

0.7002AUROC

CoT-UQ

0.4621440.5239470.585750.647553Feb 24, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.02
0.7002
2025.02
0.681
2025.02
0.6606
2025.02
0.6558
2025.02
0.6538
2025.02
0.6522
2025.02
0.6403
2025.02
0.6329
2025.02
0.6263
2025.02
0.5981
2025.02
0.5838
2025.02
0.5819
2025.02
0.5777
2025.02
0.5692
2025.02
0.5681
2025.02
0.568
2025.02
0.5639
2025.02
0.556
2025.02
0.5356
2025.02
0.5352
2025.02
0.5154
2025.02
0.5128
2025.02
0.5108
2025.02
0.4978
2025.02
0.4752
2025.02
0.4713