Temporal Preference Optimization for Unsupervised Retrieval
Abstract
Unsupervised dense retrievers offer scalability by learning semantic similarity from unlabeled documents via contrastive learning, but they struggle to capture the temporal relevance, retrieving semantically related but temporally misaligned documents-an important aspect when a document collection spans multiple time periods (e.g., retrieving from related document spanning 2018-2025 given a query "Who is the president in 2019?'' introduces temporal ambiguity). Existing methods rely on supervised training with explicit timestamps, which are not always feasible.We propose TPOUR (Temporal Preference Optimization for Unsupervised Retriever), which integrates our novel training method Temporal Retrieval Preference Optimization (TRPO). TRPO reinterprets preference learning in the temporal dimension, guiding the retriever to favor temporally aligned documents. TPOUR further generalizes to unseen time periods via interpolation in a learned time embedding, enabling continuous temporal alignment. Experiments on temporal QA with a mixed-timestamp document collection show that TPOUR outperforms both unsupervised and supervised baselines. Compared to Nomic Embed v2 MoE, TPOUR Contriever improves nDCG@5 by +7.13 (+23.5%) on explicit and +7.76 (+25.5%) on implicit queries on average.