Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SE-GA: Memory-Augmented Self-Evolution for GUI Agents

About

Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the Self-Evolving GUI Agent (SE-GA), a novel framework that integrates hierarchical memory structures with an iterative self-improvement mechanism. At the core of our approach is Test-Time Memory Extension (TTME), which facilitates long-term planning by dynamically retrieving episodic, semantic, and experiential memories to provide salient contexts during inference. To ensure continuous learning, we introduce Memory-Augmented Self-Evolution (MASE), which is a training pipeline that adopts the data collected by TTME to stabilize and enhance the agent's foundational policy. Extensive evaluations across both offline and online benchmarks demonstrate SE-GA achieves state-of-the-art performance, reaching success rates of 89.0\% on ScreenSpot and 75.8\% on the challenging AndroidControl-High dataset. Furthermore, significant improvements on the AndroidWorld benchmark highlight the superior generalization to dynamic environments. Open source code: https://github.com/jinshilong-dev/SE-GA

Shilong Jin, Lanjun Wang, Zhuosheng Zhang• 2026

Related benchmarks

TaskDatasetResultRank
GUI Agent TaskAndroidWorld
Success Rate39
188
GUI planningAndroidControl High
SR75.8
30
GUI GroundingScreenSpot V1 (Overall)
Average Accuracy89
22
GUI GroundingScreenSpot mobile
Text Accuracy96.3
18
GUI GroundingScreenspot Desktop
Text Acc95.9
18
GUI GroundingScreenspot Web
Text Accuracy91
18
Low-level GUI ExecutionAndroidControl Low
Grounding Accuracy92.5
9
Cross-app GUI NavigationGUIOdyssey
Grounding Accuracy87.4
9
Showing 8 of 8 rows

Other info

Follow for update