Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Embodied AI Performance on Aggregate Sokoban, FrozenLake, Navigation, PrimitiveSkill
Loading...
93
Overall Success Rate
AtlasVA
5.64
28.32
51
73.68
May 18, 2026
Overall Success Rate
Updated 15d ago
Evaluation Results
Method
Method
Links
Overall Success Rate
AtlasVA
Model Category=Ours, P...
2026.05
93
VAGEN
Model Category=Open-So...
2026.05
78
o3
Model Category=Proprie...
2026.05
71
GPT-5
Model Category=Proprie...
2026.05
69
Claude Sonnet 4.5
Model Category=Proprie...
2026.05
62
o4-mini
Model Category=Proprie...
2026.05
60
GPT-4o
Model Category=Proprie...
2026.05
60
Gemini 2.5 flash
Model Category=Proprie...
2026.05
58
Qwen2.5-VL-72B
Model Category=Open-So...
2026.05
55
Gemini 2.5 Pro
Model Category=Proprie...
2026.05
51
Claude Sonnet 3.7
Model Category=Proprie...
2026.05
51
Gemini 2.0
Model Category=Proprie...
2026.05
39
Qwen2.5-VL-7B
Model Category=Open-So...
2026.05
19
VLM-R1-3B
Model Category=Open-So...
2026.05
10
Qwen2.5-VL-3B
Model Category=Open-So...
2026.05
9
Feedback
Search any
task
Search any
task