MemDecoder: Enhancing Test-Time Compute for LLM Agents via Reinforced Memory Decoding
Abstract
Agentic memory—conditioning large language and vision–language models on past cases, external knowledge, or meta‑experiences—has become a key mechanism for improving inference‑time reasoning. However, existing approaches largely rely on heuristic retrieval or expensive LLM‑based reranking, and do not explicitly learn how to compose memory for a given query. To address these limitations, we propose MemDecoder, a learned framework for adaptive agentic memory selection. MemDecoder formulates memory composition as an autoregressive index decoding problem over a retrieved candidate set, using a lightweight Transformer encoder–decoder to generate an ordered sequence of memory elements. This design enables efficient, task‑aware few‑shot reasoning without generating textual demonstrations. MemDecoder can be trained via supervised fine‑tuning and reinforcement learning with verifiable rewards. We further introduce a ranking‑aware variant of Group Relative Policy Optimization that exploits pairwise comparisons within response groups to provide richer learning signals. Experiments across visual question answering, mathematical reasoning, and scientific question answering benchmarks show that MemDecoder consistently outperforms prior agentic memory selection methods, demonstrating the benefits of the architectural design and learning algorithm of MemDecoder.