Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text Game on Sokoban (test)
Loading...
53.9
Accuracy
OPCD
7.62
19.635
31.65
43.665
Feb 12, 2026
Accuracy
IF-Eval
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
IF-Eval
OPCD
Model=Qwen3-4B-Ins
2026.02
53.9
82.4
Context Distill.
Model=Qwen3-4B-Ins
2026.02
51.6
82.3
In-Context
Model=Qwen3-4B-Ins
2026.02
48.4
-
Base Model
Model=Qwen3-4B-Ins
2026.02
9.4
82.8
Feedback
Search any
task
Search any
task