Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text Game on Frozen Lake (test)
Loading...
38.3
Accuracy
OPCD
5.02
13.66
22.3
30.94
Feb 12, 2026
Accuracy
IF-Eval
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
IF-Eval
OPCD
Model=Qwen3-1.7B
2026.02
38.3
66.7
Context Distill.
Model=Qwen3-1.7B
2026.02
35.2
65.4
In-Context
Model=Qwen3-1.7B
2026.02
31.4
-
Base Model
Model=Qwen3-1.7B
2026.02
6.3
67.3
Feedback
Search any
task
Search any
task