Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on MuSiQue (Recall %)
Loading...
57.7
Recall
R1 Searcher-7B
36.172
41.761
47.35
52.939
Jul 10, 2025
Recall
Updated 1mo ago
Evaluation Results
Method
Method
Links
Recall
R1 Searcher-7B
Avg Tokens=4,665, FLOP...
2025.07
57.7
CoRAG-8B
Avg Tokens=7,683, FLOP...
2025.07
54
FrugalRAG-7B
Avg Tokens=11,914, FLO...
2025.07
52.6
SimpleDeepSearcher-7B
Avg Tokens=10,027, FLO...
2025.07
50.4
Search-R1-7B
Avg Tokens=2,212, FLOP...
2025.07
38.1
O2 Searcher-3B
Avg Tokens=2,923, FLOP...
2025.07
37
Feedback
Search any
task
Search any
task