SE-GA: Memory-Augmented Self-Evolution for GUI Agents
About
Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the Self-Evolving GUI Agent (SE-GA), a novel framework that integrates hierarchical memory structures with an iterative self-improvement mechanism. At the core of our approach is Test-Time Memory Extension (TTME), which facilitates long-term planning by dynamically retrieving episodic, semantic, and experiential memories to provide salient contexts during inference. To ensure continuous learning, we introduce Memory-Augmented Self-Evolution (MASE), which is a training pipeline that adopts the data collected by TTME to stabilize and enhance the agent's foundational policy. Extensive evaluations across both offline and online benchmarks demonstrate SE-GA achieves state-of-the-art performance, reaching success rates of 89.0\% on ScreenSpot and 75.8\% on the challenging AndroidControl-High dataset. Furthermore, significant improvements on the AndroidWorld benchmark highlight the superior generalization to dynamic environments. Open source code: https://github.com/jinshilong-dev/SE-GA
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| GUI Agent Task | AndroidWorld | Success Rate39 | 188 | |
| GUI planning | AndroidControl High | SR75.8 | 30 | |
| GUI Grounding | ScreenSpot V1 (Overall) | Average Accuracy89 | 22 | |
| GUI Grounding | ScreenSpot mobile | Text Accuracy96.3 | 18 | |
| GUI Grounding | Screenspot Desktop | Text Acc95.9 | 18 | |
| GUI Grounding | Screenspot Web | Text Accuracy91 | 18 | |
| Low-level GUI Execution | AndroidControl Low | Grounding Accuracy92.5 | 9 | |
| Cross-app GUI Navigation | GUIOdyssey | Grounding Accuracy87.4 | 9 |