Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Retrieval on HotPotQA (Including Efficiency Metrics)
Loading...
70.4
Recall
FrugalRAG-7B
47.312
53.306
59.3
65.294
Jul 10, 2025
Recall
Average Tokens
FLOPs
Latency (s)
Search Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Recall
Average Tokens
FLOPs
Latency (s)
Search Count
FrugalRAG-7B
Model Scale=7B
2025.07
70.4
9,138
1.27
0.2415
2.89
R1 Searcher-7B
Model Scale=7B
2025.07
69.1
4,458
6.17
0.1335
2.22
SimpleDeepSearcher-7B
Model Scale=7B
2025.07
64.8
9,657
1.31
0.6571
2.75
CoRAG-8B
Model Scale=8B
2025.07
64.3
7,770
1.24
0.1486
4
O2 Searcher-3B
Model Scale=3B
2025.07
50.1
2,937
1.79
0.105
1.77
Search-R1-7B
Model Scale=7B
2025.07
48.2
2,119
2.99
0.065
1.28
Feedback
Search any
task
Search any
task