From Interactions to Principles: Experience-Driven Self-Distillation for Evolving LLM Agents
Abstract
LLM agents have achieved strong performance in tool-augmented reasoning, but most remain largely stateless: after each episode, the agent discards interaction traces and does not accumulate reusable strategies. Prior work either stores raw trajectories for case-based reuse or relies on external teacher models to write reflections, which limits generalization or leaves the agent’s policy unchanged. We introduce EvolveR, an experience-driven framework that allows an agent to improve using its own interaction history. EvolveR maintains an experience base of distilled strategic principles derived from past trajectories. In an offline phase, the agent self-distills successful and failed trajectories into concise principles, applies semantic deduplication, and assigns each principle an empirical utility score for maintenance and pruning. In an online phase, the agent retrieves top-ranked principles to guide reasoning and tool usage, generating new trajectories. We then perform policy evolution with reinforcement learning on these experience-conditioned trajectories, reinforcing behaviors that effectively retrieve and apply useful principles. We demonstrate the effectiveness of EvolveR on complex multi-hop question-answering benchmarks, where it achieves superior performance over strong agentic baselines. Our work presents a comprehensive blueprint for agents that learn not only from external data but also from the consequences of their own actions, paving the way for more autonomous and continuously improving systems.