Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Puzzle Solving on Futoshiki (Out Of Distribution)
Loading...
42.6
Avg@128
Markov
-1.704
9.798
21.3
32.802
Mar 20, 2026
Avg@128
Pass@128
Updated 26d ago
Evaluation Results
Method
Method
Links
Avg@128
Pass@128
Markov
Model=Qwen3-4B, Traini...
2026.03
42.6
53
Markov
Model=Qwen2.5-3B-It, T...
2026.03
28.3
67
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
25.2
75
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
24.9
56
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
16.9
21
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
1.1
60
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0.3
28
Markov
Model=Qwen2.5-3B-It, T...
2026.03
0.3
26
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0
0
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0
0
Markov
Model=Qwen3-4B, Traini...
2026.03
0
0
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
0
1
Feedback
Search any
task
Search any
task