Timezone: »
While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data.In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.
Author Information
Charlie Hou (Carnegie Mellon University)
Kiran Thekumparampil (Amazon)
Michael Shavlovsky (University of California, Santa Cruz)
Giulia Fanti (CMU)
Yesh Dattatreya (Georgia Institute of Technology)
Sujay Sanghavi (UT Austin)
More from the Same Authors
-
2021 : Multistage stepsize schedule in Federated Learning: Bridging Theory and Practice »
Charlie Hou · Kiran Thekumparampil -
2022 : Positive Unlabeled Contrastive Representation Learning »
Anish Acharya · Sujay Sanghavi · Li Jing · Bhargav Bhushanam · Michael Rabbat · Dhruv Choudhary · Inderjit Dhillon -
2023 : UCB Provably Learns From Inconsistent Human Feedback »
Shuo Yang · Tongzheng Ren · Inderjit Dhillon · Sujay Sanghavi -
2023 : Contextual Set Selection Under Human Feedback With Model Misspecification »
Shuo Yang · Rajat Sen · Sujay Sanghavi -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: Understanding Self-Distillation in the Presence of Label Noise »
Rudrajit Das · Sujay Sanghavi -
2022 Poster: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Spotlight: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Poster: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2022 Spotlight: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2020 Poster: InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs »
Zinan Lin · Kiran Thekumparampil · Giulia Fanti · Sewoong Oh -
2020 Poster: Extreme Multi-label Classification from Aggregated Labels »
Yanyao Shen · Hsiang-Fu Yu · Sujay Sanghavi · Inderjit Dhillon -
2019 Poster: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Poster: Learning with Bad Training Data via Iterative Trimmed Loss Minimization »
Yanyao Shen · Sujay Sanghavi -
2019 Oral: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Oral: Learning with Bad Training Data via Iterative Trimmed Loss Minimization »
Yanyao Shen · Sujay Sanghavi