Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

About

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as ``Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A$^*$ search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A$^*$ on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A$^*$ search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent ``FaSTA$^*$'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A$^*$ search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA$^*$ is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate. Our code and data can be accessed at https://github.com/tianyi-lab/FaSTAR.

Advait Gupta, Rishie Raj, Dang Nguyen, Tianyi Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Image-Only EditingCoSTA Image-Only Tasks 1.0 (test)
Avg Human Score92
28
Text+Image EditingCoSTA Text+Image Tasks 1.0 (test)
Avg Human Score92
21
Multi-turn image editingCoSTA All Tasks 1.0 (test)
Avg Human Score0.91
7
Image EditingFaSTA Evaluation Benchmark
Avg Quality Score91
3
Multimodal EditingComplex-Edit 1-3 Subtasks
Cost (s)35.87
2
Multimodal EditingComplex-Edit 4-5 Subtasks
Editing Cost (s)54.17
2
Multimodal EditingComplex-Edit 6-8 Subtasks
Editing Cost (s)73.2
2
Multimodal EditingComplex-Edit Overall
Average Cost (s)55.12
2
Showing 8 of 8 rows

Other info

Follow for update