Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on HotpotQA (F1, Toks)
Loading...
29.1
F1 Score
CurvFlag
18.18
21.015
23.85
26.685
Feb 13, 2026
F1 Score
Token Count
Updated 3mo ago
Evaluation Results
Method
Method
Links
F1 Score
Token Count
CurvFlag
Cost target=T, Routing...
2026.02
29.1
148
FLARE+Tex
Cost target=T, Routing...
2026.02
28.3
155
SelfRt+Tex
Cost target=T, Routing...
2026.02
24.8
301
FLARE
Cost target=T, Routing...
2026.02
21.4
163
Self-Route
Cost target=T, Routing...
2026.02
18.6
9,271
Feedback
Search any
task
Search any
task