ElicitR: Unlocking Latent Reasoning in Dense Retrievers via Generative Regularization
Abstract
Reasoning-intensive retrieval is increasingly important for downstream applications, requiring more than lexical overlap or coarse semantic matching. While prior work mainly relies on Language Models (LMs) to synthesize reasoning-oriented supervision, we posit that it is already latent in LM-based retrievers but suppressed by contrastive overfitting. To elicit this latent reasoning, we introduce ElicitR, a retriever–LM framework with generative regularization that captures nuanced relationships among a query and its candidate documents beyond binary relevance. Concretely, alongside contrastive learning, we regularize the retriever by co-training a small LM on query–positive–negative batches. Next token prediction (NTP) for each text is conditioned on its prefix and the other in-batch texts, with cross-text conditioning weighted by retriever-computed similarities. Using MS MARCO as the only paired query-document supervision and a 135M LM for generative regularization with unlabeled raw-text initialization, ElicitR consistently improves BRIGHT by 16-29% relative across 0.1B–3B retriever scales while maintaining performance on BEIR. At 3B, ElicitR reaches an nDCG@10 of 23.1, substantially outperforming larger models trained with far more curated pairs and proprietary APIs. Further analyses show that ElicitR prevents overfitting, improves retrieval calibration, and remains robust to batch sizes, supporting its practicality.