Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop Question Answering on Average (MuSiQue, HotpotQA, 2WikiMultiHopQA, LongSeal)

46.49Average QA Score

BRIEF-PRO-LOW

14.572422.858731.14539.4313Oct 15, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
46.4925
2025.10
45.5832
2025.10
45.3342
2025.10
44.981
2025.10
43.0168
2025.10
41.552
2025.10
41.425
2025.10
41.3742
2025.10
41.2212
2025.10
40.919
2025.10
40.832
2025.10
40.0625
2025.10
39.5968
2025.10
39.28110
2025.10
38.99110
2025.10
38.7932
2025.10
38.6442
2025.10
36.87110
2025.10
36.523
2025.10
36.2126
2025.10
35.181
2025.10
34.0812
2025.10
34.0268
2025.10
33.762
2025.10
33.531
2025.10
33.1112
2025.10
33.039
2025.10
33.013
2025.10
32.9733
2025.10
32.862
2025.10
32.091
2025.10
32.029
2025.10
31.3747
2025.10
30.6526
2025.10
30.21
2025.10
30.139.5
2025.10
29.2133
2025.10
28.653
2025.10
28.0647
2025.10
26.0626
2025.10
25.0647
2025.10
24.8333
2025.10
23.2510
2025.10
22.1332
2025.10
21.7510
2025.10
20.7932
2025.10
20.729.5
2025.10
20.210
2025.10
16.5510
2025.10
15.832