Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

About

Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. Website at https://huangwl18.github.io/language-planner

Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch• 2022

Related benchmarks

Task	Dataset	Result
Visual Planning for Assistance	Visual Planning for Assistance	Mean Accuracy (mAcc)28.7	36
Continual Instruction Following	ALFRED	Success Rate (SR)18.22	28
Text-based Task Completion	Textworld	Mean Normalised Score62.25	18
Text-based Task Completion	ScienceWorld	Mean Normalised Score26.47	18
Text-based Task Completion	TW Express	Mean Normalised Score48.93	18
Text-based Task Completion	AlfWorld	Mean Normalised Score0.00e+0	18
Text-based Task Completion	Jericho	Mean Normalised Score2.21	18
Embodied Task Planning	VirtualHome (Seen)	--	18
Continual Instruction Following	VirtualHome	SR20.59	15
Continual Instruction Following	CARLA	Success Rate (SR)10.44	12

Showing 10 of 44 rows

Other info

Follow for update

@wizwand_team Discord