Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-horizon procedural planning on EgoPlan-Bench Out-of-Domain
Loading...
54.37
Success Rate
GPT-5.1
29.4828
35.9439
42.405
48.8661
Mar 9, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
GPT-5.1
Base Model=-, Instruct...
2026.03
54.37
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
54.3
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
50.17
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
44.42
PlanAgent + Mem.
Base Model=Qwen3-VL-8B...
2026.03
43.31
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
40.52
GPT-4V
Base Model=-, Instruct...
2026.03
36.9
PlanAgent (Ours)
Base Model=Qwen3-VL-8B...
2026.03
35.72
Video-LLaMA
Base Model=LLaMA2-Chat...
2026.03
30.44
Feedback
Search any
task
Search any
task