Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples

About

Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally propose to correct the mistakes using visual cues from the agent. The proposed scheme allows us to use a few language pairs thanks to the visual cues and outperforms state-of-the-art approaches. Our code is available at https://github.com/snumprlab/flare.

Taewoong Kim, Byeonghwi Kim, Jonghyun Choi• 2024

Related benchmarks

TaskDatasetResultRank
Instruction ExecutionVirtualHome (unseen domains)
Success Rate46.64
15
Embodied Task PlanningVirtualHome (Seen)
Simple Success54.69
10
Embodied Task PlanningRLBench Unseen domains
Success Rate34.37
6
Embodied Task PlanningALFWorld (seen domains)
Success Rate (SR)21.22
6
Embodied Task PlanningRLBench Seen domains
Success Rate53.05
6
Embodied Task PlanningVirtualHome (unseen domains)
Success Rate40.07
6
Embodied Task PlanningALFWorld (unseen domains)
Success Rate (SR)11.31
6
Few-shot task expansionVirtualHome unseen domains 1-shot
SR42.17
5
Few-shot task expansionVirtualHome unseen domains 5-shot
Success Rate46.64
5
Few-shot task expansionALFWorld 1-shot (unseen domains)
Success Rate (SR)12.28
5
Showing 10 of 16 rows

Other info

Follow for update