Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
About
Building agents with large language models (LLMs) for computer control is a burgeoning research area, where the agent receives computer states and performs actions to complete complex tasks. Previous computer agents have demonstrated the benefits of in-context learning (ICL); however, their performance is hindered by several issues. First, the limited context length of LLMs and complex computer states restrict the number of exemplars, as a single webpage can consume the entire context. Second, the exemplars in current methods, such as high-level plans and multi-choice questions, cannot represent complete trajectories, leading to suboptimal performance in long-horizon tasks. Third, existing computer agents rely on task-specific exemplars and overlook the similarity among tasks, resulting in poor generalization to novel tasks. To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks. We evaluate Synapse on MiniWoB++, a standard task suite, and Mind2Web, a real-world website benchmark. In MiniWoB++, Synapse achieves a 99.2% average success rate (a 10% relative improvement) across 64 tasks using demonstrations from only 48 tasks. Notably, Synapse is the first ICL method to solve the book-flight task in MiniWoB++. Synapse also exhibits a 56% relative improvement in average step success rate over the previous state-of-the-art prompting scheme in Mind2Web.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Embodied Task Completion | EB-Habitat | Avg Success Rate47.2 | 32 | |
| GUI Navigation | Mind2Web (Cross-Website) | Element Accuracy29.4 | 23 | |
| Embodied Instruction Following | EB-ALFRED 1.0 (test) | Success Rate (Avg)38.8 | 20 | |
| Web Agent Navigation | MIND2WEB Cross-Task 1.0 | Element Accuracy35 | 16 | |
| Web Agent Navigation | MIND2WEB Cross-Domain 1.0 | Element Accuracy30.5 | 16 | |
| Web Agent Navigation | Mind2Web All 1.0 | Element Accuracy0.313 | 16 | |
| Web Action Generation Efficiency | Mind2Web Cross-Task | Time to Procedure1.55e+3 | 16 | |
| Web Action Generation Efficiency | Mind2Web (Cross-Website) | To_Pro Steps/Time1.85e+3 | 16 | |
| Web Action Generation Efficiency | Mind2Web Cross-Domain | To_Pro (Steps/Time)1.66e+3 | 16 | |
| Web Action Generation Efficiency | Mind2Web (All) | Time to Proposal Steps1.76e+3 | 16 |