Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon procedural planning on EgoPlan-Bench Out-of-Domain

54.37Success Rate

GPT-5.1

Updated 4mo ago

Evaluation Results

Method	Links
GPT-5.1 2026.03		54.37
PlanAgent + Mem. 2026.03		54.3
PlanAgent + Mem. 2026.03		50.17
Video-LLaMA 2026.03		44.42
PlanAgent + Mem. 2026.03		43.31
Video-LLaMA 2026.03		40.52
GPT-4V 2026.03		36.9
PlanAgent (Ours) 2026.03		35.72
Video-LLaMA 2026.03		30.44