Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Explorative Reasoning on Crosswords Word-level (test)
Loading...
13.5
Accuracy
CoT
-0.02
3.49
7
10.51
Mar 6, 2026
Accuracy
Average Output Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Tokens
CoT
Backbone=Qwen3-30B
2026.03
13.5
13,201
RouteGoT
Backbone={Qwen3-4B, 8B...
2026.03
11
3,804
GoT*
Backbone=Qwen3-30B
2026.03
6.5
11,674
IO
Backbone=Qwen3-30B
2026.03
6
467
EmbedLLM
Backbone={Qwen3-4B, 8B...
2026.03
5
6,017
AGoT
Backbone=Qwen3-30B
2026.03
3.5
20,893
RouteLLM
Backbone={Qwen3-4B, 8B...
2026.03
2.5
5,831
ToT
Backbone=Qwen3-30B
2026.03
2.1
9,563
RTR
Backbone={Qwen3-4B, 8B...
2026.03
1.5
4,065
Random
Backbone={Qwen3-4B, 8B...
2026.03
1
6,635
KNN
Backbone={Qwen3-4B, 8B...
2026.03
0.5
5,813
Feedback
Search any
task
Search any
task