Experience-Evolving Multi-Turn Tool-Use Agent with Hybrid Episodic–Procedural Memory
Abstract
As intents unfold and environments change, multi-turn agents face continuously shifting decision contexts. Although reusing past experience is intuitively appealing, existing approaches remain limited: full trajectories are often too context-specific to transfer, while tool-level reuse ignores the context and environment. In this paper, we introduce a hybrid episodic–procedural memory strategy (H-EPM) that enables experience-induced self-evolution of multi-turn tool-use policies, by adaptively reusing partially overlapping successful experiences in both inference and training. Inspired by human episodic–procedural integration, we build a tool graph from accumulated trajectories, where recurring tool-to-tool dependencies capture procedural routines and each edge is augmented with a compact episodic summaries of relevant context. At inference, the agent dynamically balances episodic recall for contextual reasoning and procedural execution for routine steps. Beyond inference, H-EPM introduces a memory-guided reinforcement learning paradigm that directly addresses a core challenge in multi-turn agent RL: ineffective exploration over long trajectories. By biasing exploration toward historically successful tool transitions, H-EPM learns a stronger policy that generalizes during inference without relying on domain-specific experience collection. Experiments show that H-EPM consistently delivers substantial inference-time gains over strong baselines across multi-turn tool-use benchmarks, reaching up to 50\%+. It also boosts RL policy performance, achieving up to 40\%+ improvement on out-of-distribution tasks.