Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop QA on Bamboogle (F1)
Loading...
72
F1 Score
Step-Level
60.872
63.761
66.65
69.539
May 21, 2026
F1 Score
Updated 9d ago
Evaluation Results
Method
Method
Links
F1 Score
Step-Level
retrieval_strategy=ste...
2026.05
72
Search-R1
2026.05
68
IRCoT
2026.05
65.7
Static RAG
2026.05
61.3
Feedback
Search any
task
Search any
task