Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Explorative Reasoning on Crosswords Letter-level (test)
Loading...
23.4
Accuracy
RouteGoT
5.512
10.156
14.8
19.444
Mar 6, 2026
Accuracy
Average Output Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Tokens
RouteGoT
Backbone={Qwen3-4B, 8B...
2026.03
23.4
3,804
GoT*
Backbone=Qwen3-30B
2026.03
22.4
11,674
CoT
Backbone=Qwen3-30B
2026.03
22.2
13,201
IO
Backbone=Qwen3-30B
2026.03
22
467
EmbedLLM
Backbone={Qwen3-4B, 8B...
2026.03
19
6,017
RouteLLM
Backbone={Qwen3-4B, 8B...
2026.03
12.8
5,831
ToT
Backbone=Qwen3-30B
2026.03
12.4
9,563
AGoT
Backbone=Qwen3-30B
2026.03
11
20,893
RTR
Backbone={Qwen3-4B, 8B...
2026.03
9.4
4,065
Random
Backbone={Qwen3-4B, 8B...
2026.03
7.6
6,635
KNN
Backbone={Qwen3-4B, 8B...
2026.03
6.2
5,813
Feedback
Search any
task
Search any
task