Textual Stochastic Gradient Descent: Discrete Optimization of External Memory for Reasoning Language Agents
Jian Li ⋅ Hua Huang
Abstract
While Large Language Models (LLMs) possess strong reasoning capabilities, enabling them to learn continuously from experience without parametric retraining remains an open challenge. Existing Retrieval-Augmented Generation (RAG) approaches typically treat memory as a static or append-only corpus, leading to "memory saturation''---where accumulating noise and redundant information degrade performance over time. To address this, we propose an Experience Risk Minimization (ERM) framework that formalizes the experience library as a learnable parameter under an explicit capacity budget. We introduce Textual Stochastic Gradient Descent (TSGD), a discrete optimization algorithm that refines this library via failure-driven Add, Edit, and Delete operations. TSGD estimates ``textual gradients'' through self-reflection and employs a dual-verification mechanism to ensure generalization, effectively preventing overfitting to local errors. Empirical results on MATH and AIME benchmarks demonstrate that TSGD achieves state-of-the-art performance, improving accuracy by up to 18.7\% over zero-shot baselines and significantly outperforming static RAG, all while maintaining a compact memory footprint (compressing hundreds of experiences into $\approx$30 high-utility rules).
Successful Page Load