Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-horizon procedural planning on EgoPlan-Bench In-Domain
Loading...
62.46
Success Rate
PlanAgent + Mem.
26.4968
35.8334
45.17
54.5066
Mar 9, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
62.46
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
56.49
GPT-5.1
Base Model=-, Instruct...
2026.03
55.08
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
54.65
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
52.14
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
44.68
GPT-4V
Base Model=-, Instruct...
2026.03
38.4
PlanAgent (Ours)
Base Model=Qwen3-VL-8B...
2026.03
36.6
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
27.88
Feedback
Search any
task
Search any
task