Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-horizon procedural planning on EgoPlan-Bench All
Loading...
58.72
Success Rate
PlanAgent + Mem.
26.324
34.7345
43.145
51.5555
Mar 9, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
58.72
GPT-5.1
Base Model=-, Instruct...
2026.03
54.78
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
53.29
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
51.83
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
48.94
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
43.63
GPT-4V
Base Model=-, Instruct...
2026.03
37.98
PlanAgent (Ours)
Base Model=Qwen3-VL-8B...
2026.03
35.81
Gemini-Pro-Vision
Base Model=-, Instruct...
2026.03
30.46
SEED-LLaMA
Base Model=LLaMA2-Chat...
2026.03
29.93
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
28.58
Qwen-VL-Chat
Base Model=Qwen-7B, In...
2026.03
27.69
DeepSeek-VL-Chat
Base Model=DeepSeek-LL...
2026.03
27.57
Feedback
Search any
task
Search any
task