Token-Free Hierarchical Indexing for RAG beyond LLM-based Summarization
Abstract
Retrieval-Augmented Generation (RAG) increasingly relies on hierarchical indexing, yet existing frameworks are bottlenecked by the high cost and information loss of recursive, LLM-based summarization. We propose SeRAG, a novel token-free hierarchical indexing framework that replaces textual summaries with an information-theoretic knowledge taxonomy. SeRAG first transforms a corpus into a multi-perspective graph capturing semantic, logical, and sequential dependencies, then minimizes structural entropy to induce a topologically-faithful encoding tree. To bridge the gap between abstract themes and granular facts, we introduce localized structural weight-based vector aggregation for token-free community consolidation. Extensive experiments demonstrate that SeRAG significantly reduces indexing overhead while outperforming state-of-the-art methods in complex multi-hop reasoning tasks.