Poster Wed, Jul 8, 2026 • 1:00 AM – 2:45 AM PDT HALL A #1007

SE-GA: Memory-Augmented Self-Evolution for GUI Agents

shilong jin ⋅ Lanjun Wang ⋅ Zhuosheng Zhang

Project Page

Abstract

Autonomous Graphical User Interface (GUI) agents often struggle with multi-step tasks due to constrained context windows and static policies that fail to adapt to dynamic environments. To address these limitations, this work proposes the Self-Evolving GUI Agent (SE-GA), a novel framework that integrates hierarchical memory structures with an iterative self-improvement mechanism. At the core of our approach is Test-Time Memory Extension (TTME), which facilitates long-term planning by dynamically retrieving episodic, semantic, and experiential memories to provide salient contexts during inference. To ensure continuous learning, we introduce Memory-Augmented Self-Evolution (MASE), which is a training pipeline that adopts the data collected by TTME to stabilize and enhance the agent's foundational policy. Extensive evaluations across both offline and online benchmarks demonstrate SE-GA achieves state-of-the-art performance, reaching success rates of 89.0\% on ScreenSpot and 75.8\% on the challenging AndroidControl-High dataset. Furthermore, significant improvements on the AndroidWorld benchmark highlight the superior generalization to dynamic environments.

Lay Summary

Just as people learn to use smartphone applications and computer software through practice, remembering which buttons to press, mastering general rules such as "settings are usually behind a gear icon," and recalling how they solved a similar task before. This paper gives AI agents the same ability. Current AI assistants that click and type on screens tend to forget what they did a few steps ago and cannot learn from their past successes or failures. We introduce SE-GA, an AI agent that has two key abilities. First, it maintains three types of memory: a short-term record of recent actions, a collection of general interaction rules, and a library of past successful strategies it can look up when facing familiar tasks. Second, it continuously improves itself by practicing tasks, learning from mistakes, and even salvaging useful lessons from failed attempts. In tests across mobile, desktop, and web environments, SE-GA significantly outperforms existing agents, particularly on complex multi-step tasks, and keeps getting better the more it practices.