Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Retrieval on Average (HotpotQA, 2WikiMultihopQA, Musique, Morehopqa) (test)
Loading...
78.44
Average Recall
ThinkGR
41.4576
51.0588
60.66
70.2612
May 21, 2026
Average Recall
Updated 12d ago
Evaluation Results
Method
Method
Links
Average Recall
ThinkGR
Model Parameters=8B
2026.05
78.44
w/o Thought
Model Parameters=8B
2026.05
73.53
w/o RL
Model Parameters=8B
2026.05
71.73
GritHopper
Model Parameters=7B
2026.05
71.58
R3-RAG
Model Parameters=8B
2026.05
65.88
RT-RAG
Model Parameters=8B
2026.05
61.47
IRCoT
Model Parameters=70B
2026.05
59.42
ITER-RETGEN
Model Parameters=70B
2026.05
55.5
Auto-RAG
Model Parameters=7B
2026.05
55.17
MDR
Model Parameters=110M
2026.05
54.73
BGE-large
Model Parameters=326M
2026.05
49.97
SEAL
Model Parameters=406M
2026.05
46.84
w/o SFT
Model Parameters=8B
2026.05
46.15
Selfask
Model Parameters=70B
2026.05
45.84
Contriever
2026.05
45.17
BM25
2026.05
42.88
Feedback
Search any
task
Search any
task