Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Puzzle Solving on Sudoku Out Of Distribution
Loading...
77.8
Average @128
Markov
-3.112
17.894
38.9
59.906
Mar 20, 2026
Average @128
Pass Rate @128
Updated 26d ago
Evaluation Results
Method
Method
Links
Average @128
Pass Rate @128
Markov
Model=Qwen3-4B, Traini...
2026.03
77.8
82
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
71.2
82
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
69.2
82
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
57.1
69
Markov
Model=Qwen2.5-3B-It, T...
2026.03
56.4
68
Markov
Model=Qwen3-4B, Traini...
2026.03
8.7
86
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
3.1
64
Markov
Model=Qwen2.5-3B-It, T...
2026.03
2.9
75
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
2.4
71
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
1.9
62
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0
0
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0
0
Feedback
Search any
task
Search any
task