Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Maze Solving on DFS maze (test)
Loading...
99.3
Plan Accuracy
Correct Traces
-2.62
23.84
50.3
76.76
May 19, 2025
Plan Accuracy
Trace Validity
Updated 7d ago
Evaluation Results
Method
Method
Links
Plan Accuracy
Trace Validity
Correct Traces
Backbone=Qwen3-8B-base...
2025.05
99.3
100
Solution-only
Backbone=Qwen3-8B-base...
2025.05
78.2
0
Swapped Traces
Backbone=Qwen3-8B-base...
2025.05
76.6
0
Base Model
Backbone=Qwen3-8B-base...
2025.05
1.3
0
Feedback
Search any
task
Search any
task