Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Frozen Lake Evaluation on Frozen Lake 128 samples (test)
Loading...
6.3
Accuracy
Base Model
5.985
6.1425
6.3
6.4575
Feb 12, 2026
Accuracy
IF-Eval (OOD)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
IF-Eval (OOD)
Base Model
Model=Qwen3-1.7B
2026.02
6.3
67.3
Feedback
Search any
task
Search any
task