CrossQ: Task-Aligned Cross-Token Conditional Quantization for Late Interaction Retrieval
Abstract
Late-interaction retrievers like ColBERT achieve high quality but suffer from large multi-vector indices. Standard compression minimizes token reconstruction error, while ranking depends critically on preserving scores of sparse "winner" tokens. We introduce CrossQ which adaptively allocates precision within documents by conditioning token codes on lightweight document context computed at indexing time (but not stored). Trained with ranking-aligned objectives that preserve candidate score distributions and protect hard-negative margins, CrossQ improves MRR@10 by +0.012 on MS MARCO and average nDCG@10 by +0.018 on BEIR over strong baselines at matched footprints (2-8 B/token). With light fine-tuning, CrossQ achieves 61x compression while narrowing the gap with full-precision ColBERT to just 2.3% MRR@10, establishing a new state-of-the-art in the latency-quality tradeoff for memory-constrained retrieval.