Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Interactive Reasoning on Timely-Eval
Loading...
34.9
Zork1 Score
Gemini2.5-pro
0.996
9.798
18.6
27.402
Jan 23, 2026
Zork1 Score
Advent Score
Enchanter Score
Detective Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Zork1 Score
Advent Score
Enchanter Score
Detective Score
Gemini2.5-pro
variant=pro
2026.01
34.9
50.7
34.1
71.9
GPT-5.1(medium)
variant=medium
2026.01
34.1
57.6
24.4
105
TimelyLM-8B
size=8B
2026.01
27.5
48.5
29.5
88.1
DeepSeek-V3.2
2026.01
24.9
48.7
15.9
63.2
Qwen3-32B
size=32B
2026.01
14.4
38.2
11.7
70
Qwen3-14B
size=14B
2026.01
9.8
34.9
9.5
50.5
Qwen3-8B
size=8B
2026.01
2.3
36
5.2
54.1
Feedback
Search any
task
Search any
task