Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Question Answering on HotpotQA (test)
Loading...
88
Accuracy
RouteGoT
58.88
66.44
74
81.56
Mar 6, 2026
Accuracy
Average Output Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Tokens
RouteGoT
Backbone={Qwen3-4B, 8B...
2026.03
88
592
GoT*
Backbone=Qwen3-30B
2026.03
87
812
CoT
Backbone=Qwen3-30B
2026.03
86
431
RouteLLM
Backbone={Qwen3-4B, 8B...
2026.03
83
555
IO
Backbone=Qwen3-30B
2026.03
81
11
KNN
Backbone={Qwen3-4B, 8B...
2026.03
79
627
RTR
Backbone={Qwen3-4B, 8B...
2026.03
79
554
Random
Backbone={Qwen3-4B, 8B...
2026.03
77
1,685
EmbedLLM
Backbone={Qwen3-4B, 8B...
2026.03
77
2,040
AGoT
Backbone=Qwen3-30B
2026.03
72
2,583
ToT
Backbone=Qwen3-30B
2026.03
60
3,748
Feedback
Search any
task
Search any
task