Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Puzzle Solving on Futoshiki (In Distribution)
Loading...
79.8
Avg@128
Markov
-3.192
18.354
39.9
61.446
Mar 20, 2026
Avg@128
Pass@128
Updated 26d ago
Evaluation Results
Method
Method
Links
Avg@128
Pass@128
Markov
Model=Qwen2.5-3B-It, T...
2026.03
79.8
96
Markov
Model=Qwen3-4B, Traini...
2026.03
75
85
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
67.4
94
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
61.3
84
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
44.4
55
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
16.6
100
Markov
Model=Qwen2.5-3B-It, T...
2026.03
8.5
98
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
6.6
94
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
0.3
20
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0.1
7
Markov
Model=Qwen3-4B, Traini...
2026.03
0.1
11
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0
5
Feedback
Search any
task
Search any
task