Poster Thu, Jul 9, 2026 • 5:00 PM – 6:45 PM KST Coex: HALL A

Training-Free Hierarchical Working Memory for Small Language Model Agents

Ziyi Wang ⋅ Haonan Jin ⋅ Zian Wang ⋅ Wendong Wang ⋅ Lanshan Zhang

Abstract

Small language models (SLMs) are attractive for agent deployment, but they struggle to reliably retain and reuse decision-relevant state information over long interactions. This issue is exacerbated when working memory is maintained via unstructured natural-language summarization. Some recent work addresses this limitation by fine-tuning or distilling smaller models to better construct and utilize working memory, but such approaches typically incur substantial additional training cost and require continuous data construction. We present a training-free working-memory framework for SLM-based agents that makes decision-relevant state explicit: conditioned on the active (sub)goal, the agent maintains a compact information state needed for progress assessment and the currently effective action set. Our approach decomposes tasks into subgoals and organizes memory hierarchically into task-level global memory and subtask-level local memory, where local memory directly conditions SLM action selection and is updated from new observations. To instantiate goal-conditioned memories without parameter updates, we introduce an offline LLM-based induction pipeline that builds a reusable schema once per task family from a small number of representative traces. Training-free refers to no parameter updates of the deployed SLM and no online LLM calls; we only use a one-time offline LLM-based schema induction per task family. On ALFWorld valid_unseen, a 4B SLM achieves 0.910 success, while representative prompting and prior working-memory baselines under the same setting remain below 0.320.