From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG
Changmin Lee ⋅ Jaemin Kim ⋅ Taesik Gong
Abstract
With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is *what to store* so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404$\times$, improves preference-following accuracy by 20.17\%p, and achieves 33.33$\times$ lower retrieval latency over the best-performing baseline. In our on-device experiment, EPIC maintains a memory footprint under 1 MB with 27.9 ms/query retrieval latency in streaming updates. The code is available at
Successful Page Load