Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Maze Solving on Drunkard maze (test)
Loading...
61.5
Plan Accuracy
Solution-only
12.62
25.31
38
50.69
May 19, 2025
Plan Accuracy
Trace Validity (Valid Plans)
Updated 7d ago
Evaluation Results
Method
Method
Links
Plan Accuracy
Trace Validity (Valid Plans)
Solution-only
Backbone=Qwen3-8B-base...
2025.05
61.5
0
Correct Traces
Backbone=Qwen3-8B-base...
2025.05
53.4
33.7
Swapped Traces
Backbone=Qwen3-8B-base...
2025.05
38.1
0
Base Model
Backbone=Qwen3-8B-base...
2025.05
14.5
0
Feedback
Search any
task
Search any
task