Prefix cache aware data reordering for LLM augmented database analytics
Yingze Li ⋅ dong wang ⋅ Yiming Guo ⋅ Yao Chen ⋅ Hongzhi Wang ⋅ Bingsheng He
Abstract
LLM-augmented database analytics face a major bottleneck in the costly prefill phase. Although relational tables inherently contain repeated attribute values, standard row-by-row processing produces fragmented prompt layouts that obscure shared prefixes, thereby minimizing opportunities for prefix KV cache reuse and constraining system efficiency. Existing solutions typically employ heuristic or exhaustive search methods to reorder prompt layouts, but these approaches can be inefficient and may not leverage the structural properties of relational tables. We address this challenge by formulating prefix-cache-aware prompt layout optimization as a problem rooted in the isomorphism between prefix-cache reuse and the radix tree topology induced by the relational data distribution. Building on this perspective, we introduce a practical greedy tree-shaping algorithm that efficiently selects row and column orderings to maximize prefix overlap. Our approach, SOLO, improves prefill throughput by up to 90.3\% under fixed prefix-cache budget. Moreover, it reduces planning overhead by up to 242$\times$ compared to state-of-the-art baselines.
Successful Page Load