Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

About

Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents. Our code and data are available at https://github.com/fangru-lin/graph-llm-asynchow-plan.

Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert• 2024

Related benchmarks

TaskDatasetResultRank
Tool PlanningToolBench
EM (%)21.95
24
Tool PlanningUltraTool
EM (%)10.52
24
Tool Sequence PredictionHuggingFace
Tool F157.08
24
Tool PlanningHuggingFace
EM (%)8.53
24
Tool PlanningMultimedia
EM (%)12.84
24
Tool Sequence PredictionMultimedia
Tool F156.58
24
Showing 6 of 6 rows

Other info

Follow for update