Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning
Abstract
Training-free verbal reinforcement learning enables LLM agents to learn from world feedback—objective signals such as dynamic task outcomes, market returns, or demand forecasts—by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes. However, in non-stationary environments these agents face a retention-forgetting dilemma: retaining stale insights causes negative transfer, while discarding them causes catastrophic forgetting when conditions recur. We identify four requirements for navigating this dilemma—outcome-driven evaluation, persistent structured evidence, non-monotonic knowledge lifecycle, and compositional governance—and show that existing methods invest heavily in experience extraction while underinvesting in insight governance. We propose a three-layer architecture—rules, evidence, and skills—connected by a feedback-driven curation loop that closes the governance gap. Rules capture distilled experience from world outcomes; evidence logs track each rule's reliability across episodes; skills govern which rules to apply, how to resolve conflicts, and when to abstain. On financial forecasting as a case study, where world feedback is naturally abundant, noisy, and non-stationary, we show that the same accumulated experience either degrades performance below the zero-shot baseline or dramatically improves accuracy and risk-adjusted returns, depending on whether the curation loop is present.