Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on HotpotQA (Solve Rate and Executability)
Loading...
88.12
Solve Rate
CapFlow
80.9024
82.7762
84.65
86.5238
Feb 11, 2026
Solve Rate
Executability
Updated 1mo ago
Evaluation Results
Method
Method
Links
Solve Rate
Executability
CapFlow
Type=Learning, Setting...
2026.02
88.12
-
CapFlow
Type=Learning, Setting...
2026.02
86.37
-
ScoreFlow
Type=Learning, Setting...
2026.02
86
-
AFlow
Type=Refinement, Setti...
2026.02
85.87
-
ScoreFlow
Type=Learning, Setting...
2026.02
85.37
-
ADAS
Type=Refinement, Setti...
2026.02
82.84
-
CoT-SC
Type=Manual, Setting=M...
2026.02
82.07
-
CoT
Type=Manual, Setting=M...
2026.02
81.75
-
SPP
Type=Manual, Setting=M...
2026.02
81.62
-
GPT-4o-mini
Type=Manual, Setting=M...
2026.02
81.54
-
Self-Refine
Type=Manual, Setting=M...
2026.02
81.18
-
Feedback
Search any
task
Search any
task