SE-GA: Memory-Augmented Self-Evolution for GUI Agents

About

Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the Self-Evolving GUI Agent (SE-GA), a novel framework that integrates hierarchical memory structures with an iterative self-improvement mechanism. At the core of our approach is Test-Time Memory Extension (TTME), which facilitates long-term planning by dynamically retrieving episodic, semantic, and experiential memories to provide salient contexts during inference. To ensure continuous learning, we introduce Memory-Augmented Self-Evolution (MASE), which is a training pipeline that adopts the data collected by TTME to stabilize and enhance the agent's foundational policy. Extensive evaluations across both offline and online benchmarks demonstrate SE-GA achieves state-of-the-art performance, reaching success rates of 89.0\% on ScreenSpot and 75.8\% on the challenging AndroidControl-High dataset. Furthermore, significant improvements on the AndroidWorld benchmark highlight the superior generalization to dynamic environments. Open source code: https://github.com/jinshilong-dev/SE-GA

Shilong Jin, Lanjun Wang, Zhuosheng Zhang• 2026

Related benchmarks

Task	Dataset	Result
GUI Agent Task	AndroidWorld	Success Rate39	200
GUI planning	AndroidControl High	SR75.8	30
GUI Grounding	ScreenSpot V1 (Overall)	Average Accuracy89	22
GUI Grounding	ScreenSpot mobile	Text Accuracy96.3	18
GUI Grounding	Screenspot Desktop	Text Acc95.9	18
GUI Grounding	Screenspot Web	Text Accuracy91	18
Low-level GUI Execution	AndroidControl Low	Grounding Accuracy92.5	9
Cross-app GUI Navigation	GUIOdyssey	Grounding Accuracy87.4	9

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord