Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-horizon procedural planning on EgoPlan-Bench Out-of-Domain

54.37Success Rate

GPT-5.1

29.482835.943942.40548.8661Mar 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
54.37
2026.03
54.3
2026.03
50.17
2026.03
44.42
2026.03
43.31
2026.03
40.52
2026.03
36.9
2026.03
35.72
2026.03
30.44