Poster Tue, Jul 15, 2025 • 11:00 AM – 1:30 PM PDT

Peripheral Memory for LLMs: Integration of Sequential Memory Banks with Adaptive Querying

Songlin Zhai · Yuan Meng · Yongrui Chen · Yiwei Wang · Guilin Qi

Project Page [ Slides] [ Poster] [ OpenReview]

Abstract

Large Language Models (LLMs) have revolutionized various natural language processing tasks with their remarkable capabilities. However, challenges persist in effectively integrating new knowledge into LLMs without compromising their performance, particularly in the Large Language Models (LLMs) have revolutionized various natural language processing tasks with their remarkable capabilities. However, a challenge persists in effectively processing new information, particularly in the area of long-term knowledge updates without compromising model performance. To address this challenge, this paper introduces a novel memory augmentation framework that conceptualizes memory as a peripheral component (akin to physical RAM), with the LLM serving as the information processor (analogous to a CPU). Drawing inspiration from RAM architecture, we design memory as a sequence of memory banks, each modeled using Kolmogorov-Arnold Network (KAN) to ensure smooth state transitions. Memory read and write operations are dynamically controlled by query signals derived from the LLMs' internal states, closely mimicking the interaction between a CPU and RAM. Furthermore, a dedicated memory bank is used to generate a mask value that indicates the relevance of the retrieved data, inspired by the sign bit in binary coding schemes. The retrieved memory feature is then integrated as a prefix to enhance the model prediction. Extensive experiments on knowledge-based model editing validate the effectiveness and efficiency of our peripheral memory.

Lay Summary

Large language models (AI systems like ChatGPT) struggle to update their knowledge efficiently - adding new facts often requires expensive retraining or causes performance drops. Existing memory systems for these AI models are rigidly tied to their core architecture, making them hard to scale, share between different models, or adapt to new tasks. We designed a new "plug-and-play" memory system inspired by how computers separate processors (CPUs) from memory (RAM). Our system: (1) Works as a standalone memory bank that any language model can access. (2) Uses smarter storage units to handle conflicting updates. (3) Includes a quality-check filter to prevent irrelevant/outdated information from affecting responses.

Video

Chat is not available.