Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Question Answering on HybridQA (test)
Loading...
91
Accuracy
ToT
63.96
70.98
78
85.02
Mar 6, 2026
Accuracy
Average Output Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Tokens
ToT
Backbone=Qwen3-30B
2026.03
91
3,538
RouteGoT
Backbone={Qwen3-4B, 8B...
2026.03
91
700
GoT*
Backbone=Qwen3-30B
2026.03
88
1,635
CoT
Backbone=Qwen3-30B
2026.03
84
476
AGoT
Backbone=Qwen3-30B
2026.03
84
2,097
Random
Backbone={Qwen3-4B, 8B...
2026.03
70
930
KNN
Backbone={Qwen3-4B, 8B...
2026.03
68
510
RTR
Backbone={Qwen3-4B, 8B...
2026.03
68
527
IO
Backbone=Qwen3-30B
2026.03
67
21
EmbedLLM
Backbone={Qwen3-4B, 8B...
2026.03
66
1,376
RouteLLM
Backbone={Qwen3-4B, 8B...
2026.03
65
540
Feedback
Search any
task
Search any
task