Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Maze Navigation on Maze Hard
Loading...
97.66
Accuracy
GPT-5
-3.9064
22.4618
48.83
75.1982
Nov 28, 2025
Dec 26, 2025
Jan 24, 2026
Feb 22, 2026
Mar 22, 2026
Apr 20, 2026
May 19, 2026
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-5
category=Proprietary M...
2025.11
97.66
OpenAI o3
category=Proprietary M...
2025.11
93.36
PTRM
# Params=7M, K (stocha...
2026.05
86.73
TRM
# Params=7M, architect...
2026.05
85.3
Standard TRM, our reproduction
# Params=7M, architect...
2026.05
83.8
OpenAI o4-mini
category=Proprietary M...
2025.11
78.52
HRM
# Params=27M
2026.05
74.5
Claude 4.5 Sonnet
category=Proprietary M...
2025.11
68.36
Gemini 2.5 Pro
category=Proprietary M...
2025.11
63.28
WMAct
2025.11
50.59
PPO - Interactive
mode=interactive
2025.11
36.52
Qwen3-14B
category=Opensource Mo...
2025.11
28.52
PPO - EntirePlan
mode=single-turn output
2025.11
26.51
Qwen3-8B
category=Opensource Mo...
2025.11
17.76
GPT-4o
category=Proprietary M...
2025.11
1.56
Qwen2.5-32B-Instruct
category=Opensource Mo...
2025.11
0.39
Qwen3-8B-Own
backbone=Qwen3-8B
2025.11
0.2
Qwen2.5-7B-Instruct
category=Opensource Mo...
2025.11
0
Feedback
Search any
task
Search any
task