Timezone: »
Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query. There are two broad flavours of such models: cross-attention (CA) models, which learn a joint embedding for the query and document, and dual-encoder (DE) models, which learn separate embeddings for the query and document. Empirically, CA models are often found to be more accurate, which has motivated a series of works seeking to bridge this gap. However, a more fundamental question remains less explored: does this performance gap reflect an inherent limitation in the capacity of DE models, or a limitation in the training of such models? And does such an understanding suggest a principled means of improving DE models? In this paper, we study these questions, with three contributions. First, we establish theoretically that with a sufficiently large embedding dimension, DE models have the capacity to model a broad class of score distributions. Second, we show empirically that on real-world problems, DE models may overfit to spurious correlations in the training set, and thus under-perform on test samples. To mitigate this behaviour, we propose a suitable distillation strategy, and confirm its practical efficacy on the MSMARCO-Passage and Natural Questions benchmarks.
Author Information
Aditya Menon (Google Research)
Sadeep Jayasumana (Google Research)
Ankit Singh Rawat (Google)
Seungyeon Kim (Google Research)
Sashank Jakkam Reddi (Google)
Sanjiv Kumar (Google Research, NY)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: In defense of dual-encoders for neural ranking »
Thu. Jul 21st through Fri the 22nd Room Hall E #404
More from the Same Authors
-
2023 : SpecTr: Fast Speculative Decoding via Optimal Transport »
Ziteng Sun · Ananda Suresh · Jae Ro · Ahmad Beirami · Himanshu Jain · Felix Xinnan Yu · Michael Riley · Sanjiv Kumar -
2023 Poster: On the Role of Attention in Prompt-tuning »
Samet Oymak · Ankit Singh Rawat · Mahdi Soltanolkotabi · Christos Thrampoulidis -
2023 Poster: A Statistical Perspective on Retrieval-Based Models »
Soumya Basu · Ankit Singh Rawat · Manzil Zaheer -
2023 Poster: Efficient Training of Language Models using Few-Shot Learning »
Sashank Jakkam Reddi · Sobhan Miryoosefi · Stefani Karp · Shankar Krishnan · Satyen Kale · Seungyeon Kim · Sanjiv Kumar -
2022 Poster: Private Adaptive Optimization with Side information »
Tian Li · Manzil Zaheer · Sashank Jakkam Reddi · Virginia Smith -
2022 Poster: Robust Training of Neural Networks Using Scale Invariant Architectures »
Zhiyuan Li · Srinadh Bhojanapalli · Manzil Zaheer · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Spotlight: Private Adaptive Optimization with Side information »
Tian Li · Manzil Zaheer · Sashank Jakkam Reddi · Virginia Smith -
2022 Oral: Robust Training of Neural Networks Using Scale Invariant Architectures »
Zhiyuan Li · Srinadh Bhojanapalli · Manzil Zaheer · Sashank Jakkam Reddi · Sanjiv Kumar -
2021 Poster: A statistical perspective on distillation »
Aditya Menon · Ankit Singh Rawat · Sashank Jakkam Reddi · Seungyeon Kim · Sanjiv Kumar -
2021 Poster: Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces »
Ankit Singh Rawat · Aditya Menon · Wittawat Jitkrittum · Sadeep Jayasumana · Felix Xinnan Yu · Sashank Jakkam Reddi · Sanjiv Kumar -
2021 Spotlight: A statistical perspective on distillation »
Aditya Menon · Ankit Singh Rawat · Sashank Jakkam Reddi · Seungyeon Kim · Sanjiv Kumar -
2021 Spotlight: Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces »
Ankit Singh Rawat · Aditya Menon · Wittawat Jitkrittum · Sadeep Jayasumana · Felix Xinnan Yu · Sashank Jakkam Reddi · Sanjiv Kumar -
2021 Poster: Federated Composite Optimization »
Honglin Yuan · Manzil Zaheer · Sashank Jakkam Reddi -
2021 Spotlight: Federated Composite Optimization »
Honglin Yuan · Manzil Zaheer · Sashank Jakkam Reddi -
2020 Poster: Does label smoothing mitigate label noise? »
Michal Lukasik · Srinadh Bhojanapalli · Aditya Menon · Sanjiv Kumar -
2020 Poster: Low-Rank Bottleneck in Multi-head Attention Models »
Srinadh Bhojanapalli · Chulhee Yun · Ankit Singh Rawat · Sashank Jakkam Reddi · Sanjiv Kumar -
2020 Poster: Accelerating Large-Scale Inference with Anisotropic Vector Quantization »
Ruiqi Guo · Philip Sun · Erik Lindgren · Quan Geng · David Simcha · Felix Chern · Sanjiv Kumar -
2020 Poster: Supervised learning: no loss no cry »
Richard Nock · Aditya Menon -
2020 Poster: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning »
Sai Praneeth Reddy Karimireddy · Satyen Kale · Mehryar Mohri · Sashank Jakkam Reddi · Sebastian Stich · Ananda Theertha Suresh -
2020 Poster: Federated Learning with Only Positive Labels »
Felix Xinnan Yu · Ankit Singh Rawat · Aditya Menon · Sanjiv Kumar -
2019 : Structured matrices for efficient deep learning »
Sanjiv Kumar -
2019 Poster: Fairness risk measures »
Robert C Williamson · Aditya Menon -
2019 Poster: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Oral: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Oral: Fairness risk measures »
Robert C Williamson · Aditya Menon -
2019 Poster: Monge blunts Bayes: Hardness Results for Adversarial Training »
Zac Cranko · Aditya Menon · Richard Nock · Cheng Soon Ong · Zhan Shi · Christian Walder -
2019 Poster: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Oral: Monge blunts Bayes: Hardness Results for Adversarial Training »
Zac Cranko · Aditya Menon · Richard Nock · Cheng Soon Ong · Zhan Shi · Christian Walder -
2019 Oral: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2018 Poster: Loss Decomposition for Fast Learning in Large Output Spaces »
En-Hsu Yen · Satyen Kale · Felix Xinnan Yu · Daniel Holtmann-Rice · Sanjiv Kumar · Pradeep Ravikumar -
2018 Oral: Loss Decomposition for Fast Learning in Large Output Spaces »
En-Hsu Yen · Satyen Kale · Felix Xinnan Yu · Daniel Holtmann-Rice · Sanjiv Kumar · Pradeep Ravikumar -
2017 Poster: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Talk: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Poster: Distributed Mean Estimation with Limited Communication »
Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 Talk: Distributed Mean Estimation with Limited Communication »
Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan