Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity

About

Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or even unsafe in critical scenarios. Additionally, inherent ambiguity in natural language instructions can introduce uncertainty into the LLM's reasoning and planning processes.We propose introspective planning, a systematic approach that align LLM's uncertainty with the inherent ambiguity of the task. Our approach constructs a knowledge base containing introspective reasoning examples as post-hoc rationalizations of human-selected safe and compliant plans, which are retrieved during deployment. Evaluations on three tasks, including a newly introduced safe mobile manipulation benchmark, demonstrate that introspection substantially improves both compliance and safety over state-of-the-art LLM-based planning methods. Furthermore, we empirically show that introspective planning, in combination with conformal prediction, achieves tighter confidence bounds, maintaining statistical success guarantees while minimizing unnecessary user clarification requests. The webpage and code are accessible at https://introplan.github.io.

Kaiqu Liang, Zixu Zhang, Jaime Fern\'andez Fisac• 2024

Related benchmarks

Task	Dataset	Result
Introspective Planning	KitchenAmbig (In-Distribution)	Average Accuracy77.1	10
Introspective Planning	KitchenAmbig (OOD)	Average Accuracy86.3	10
Preference-aligned decision making	Housekeep (test)	Accuracy30.3	10
Preference-aligned decision making	AmbiK (test)	Accuracy51.7	10
Preference-aligned decision making	Mobile Manipulation (test)	Accuracy30.9	10
Robotic Task Planning	G-Dataset zero-shot	TSR82	9
Robotic Task Planning	R-Dataset zero-shot	TSR87.5	9
Mobile Manipulation	Mobile Manipulation	ESR94.5	7
Safe Mobile Manipulation	Safe Mobile Manipulation	ESR93	7
Safe Mobile Manipulation	Safe Mobile Manipulation GPT-3.5 (test)	SR91	7

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord