Skip to yearly menu bar Skip to main content


Poster

USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval

Seungyeon Kim · Ankit Singh Rawat · Manzil Zaheer · Sadeep Jayasumana · Veeranjaneyulu Sadhanala · Wittawat Jitkrittum · Aditya Menon · Rob Fergus · Sanjiv Kumar


Abstract:

Modern information retrieval (IR) systems consists of multiple stages like retrieval and ranking, with Transformer-based models achieving state-of-the-art performance at each stage. In this paper, we challenge the tradition of using separate models for different stages and ask if a single Transformer encoder can provide relevance score needed in each stage. We present USTAD - a new unified approach to train a single network that can provide powerful ranking scores as a cross-encoder (CE) model as well as factorized embeddings for large-scale retrieval as a dual-encoder (DE) model. Empirically, we find a single USTAD model to be competitive to separate ranking CE and retrieval DE models. Furthermore, USTAD combines well with a novel embedding matching-based distillation, significantly improving CE to DE distillation. It further motivates novel asymmetric architectures for student models to ensure a better embedding alignment between the student and the teacher while ensuring small online inference cost. On standard benchmarks like MSMARCO, we demonstrate that USTAD along with our proposed distillation method ensure effective distillation to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.

Live content is unavailable. Log in and register to view live content