Timezone: »
For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.
Author Information
En-Hsu Yen (Carnegie Mellon University)
I am currently a PhD student in the Computer Science School of Carnegie Mellon University (Machine Learning Department), working with Pradeep Ravikumar and Inderjit Dhillon. I received my B.S./B.B.A/M.S. from CSIE/IM departments of National Taiwan University, where I worked with Shou-De Lin. My research focuses on Large-Scale Machine Learning, Convex Optimization and their applications.
Satyen Kale (Google Research)
Felix Xinnan Yu (Google AI)
Daniel Holtmann-Rice (Google Inc)
Sanjiv Kumar (Google Research, NY)
Pradeep Ravikumar (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Loss Decomposition for Fast Learning in Large Output Spaces »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #186
More from the Same Authors
-
2021 : Learning with User-Level Privacy »
Daniel A Levy · Ziteng Sun · Kareem Amin · Satyen Kale · Alex Kulesza · Mehryar Mohri · Ananda Theertha Suresh -
2021 : When Is Generalizable Reinforcement Learning Tractable? »
Dhruv Malik · Yuanzhi Li · Pradeep Ravikumar -
2023 : Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models »
Tianyu Chen · Kevin Bello · Bryon Aragam · Pradeep Ravikumar -
2023 : Learning Linear Causal Representations from Interventions under General Nonlinear Mixing »
Simon Buchholz · Goutham Rajendran · Elan Rosenfeld · Bryon Aragam · Bernhard Schölkopf · Pradeep Ravikumar -
2023 : Learning Linear Causal Representations from Interventions under General Nonlinear Mixing »
Simon Buchholz · Goutham Rajendran · Elan Rosenfeld · Bryon Aragam · Bernhard Schölkopf · Pradeep Ravikumar -
2023 : Learning with Explanation Constraints »
Rattana Pukdee · Dylan Sam · Nina Balcan · Pradeep Ravikumar -
2023 : Learning Linear Causal Representations from Interventions under General Nonlinear Mixing »
Simon Buchholz · Goutham Rajendran · Elan Rosenfeld · Bryon Aragam · Bernhard Schölkopf · Pradeep Ravikumar -
2023 : SpecTr: Fast Speculative Decoding via Optimal Transport »
Ziteng Sun · Ananda Suresh · Jae Ro · Ahmad Beirami · Himanshu Jain · Felix Xinnan Yu · Michael Riley · Sanjiv Kumar -
2023 : Global Optimality in Bivariate Gradient-based DAG Learning »
Chang Deng · Kevin Bello · Pradeep Ravikumar · Bryon Aragam -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: Optimizing NOTEARS Objectives via Topological Swaps »
Chang Deng · Kevin Bello · Bryon Aragam · Pradeep Ravikumar -
2023 Poster: Representer Point Selection for Explaining Regularized High-dimensional Models »
Che-Ping Tsai · Jiong Zhang · Hsiang-Fu Yu · Eli Chien · Cho-Jui Hsieh · Pradeep Ravikumar -
2023 Poster: On the Convergence of Federated Averaging with Cyclic Client Participation »
Yae Jee Cho · PRANAY SHARMA · Gauri Joshi · Zheng Xu · Satyen Kale · Tong Zhang -
2023 Poster: Efficient Training of Language Models using Few-Shot Learning »
Sashank Jakkam Reddi · Sobhan Miryoosefi · Stefani Karp · Shankar Krishnan · Satyen Kale · Seungyeon Kim · Sanjiv Kumar -
2023 Poster: Faith-Shap: The Faithful Shapley Interaction Index »
Che-Ping Tsai · Chih-Kuan Yeh · Pradeep Ravikumar -
2022 Poster: In defense of dual-encoders for neural ranking »
Aditya Menon · Sadeep Jayasumana · Ankit Singh Rawat · Seungyeon Kim · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Poster: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Spotlight: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Spotlight: In defense of dual-encoders for neural ranking »
Aditya Menon · Sadeep Jayasumana · Ankit Singh Rawat · Seungyeon Kim · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Poster: Agnostic Learnability of Halfspaces via Logistic Loss »
Ziwei Ji · Kwangjun Ahn · Pranjal Awasthi · Satyen Kale · Stefani Karp -
2022 Poster: Robust Training of Neural Networks Using Scale Invariant Architectures »
Zhiyuan Li · Srinadh Bhojanapalli · Manzil Zaheer · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Oral: Agnostic Learnability of Halfspaces via Logistic Loss »
Ziwei Ji · Kwangjun Ahn · Pranjal Awasthi · Satyen Kale · Stefani Karp -
2022 Oral: Robust Training of Neural Networks Using Scale Invariant Architectures »
Zhiyuan Li · Srinadh Bhojanapalli · Manzil Zaheer · Sashank Jakkam Reddi · Sanjiv Kumar -
2022 Poster: Correlated Quantization for Distributed Mean Estimation and Optimization »
Ananda Suresh · Ziteng Sun · Jae Ro · Felix Xinnan Yu -
2022 Spotlight: Correlated Quantization for Distributed Mean Estimation and Optimization »
Ananda Suresh · Ziteng Sun · Jae Ro · Felix Xinnan Yu -
2021 Poster: DORO: Distributional and Outlier Robust Optimization »
Runtian Zhai · Chen Dan · Zico Kolter · Pradeep Ravikumar -
2021 Spotlight: DORO: Distributional and Outlier Robust Optimization »
Runtian Zhai · Chen Dan · Zico Kolter · Pradeep Ravikumar -
2021 Poster: A statistical perspective on distillation »
Aditya Menon · Ankit Singh Rawat · Sashank Jakkam Reddi · Seungyeon Kim · Sanjiv Kumar -
2021 Poster: Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces »
Ankit Singh Rawat · Aditya Menon · Wittawat Jitkrittum · Sadeep Jayasumana · Felix Xinnan Yu · Sashank Jakkam Reddi · Sanjiv Kumar -
2021 Spotlight: A statistical perspective on distillation »
Aditya Menon · Ankit Singh Rawat · Sashank Jakkam Reddi · Seungyeon Kim · Sanjiv Kumar -
2021 Spotlight: Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces »
Ankit Singh Rawat · Aditya Menon · Wittawat Jitkrittum · Sadeep Jayasumana · Felix Xinnan Yu · Sashank Jakkam Reddi · Sanjiv Kumar -
2021 Poster: On Proximal Policy Optimization's Heavy-tailed Gradients »
Saurabh Garg · Joshua Zhanson · Emilio Parisotto · Adarsh Prasad · Zico Kolter · Zachary Lipton · Sivaraman Balakrishnan · Ruslan Salakhutdinov · Pradeep Ravikumar -
2021 Spotlight: On Proximal Policy Optimization's Heavy-tailed Gradients »
Saurabh Garg · Joshua Zhanson · Emilio Parisotto · Adarsh Prasad · Zico Kolter · Zachary Lipton · Sivaraman Balakrishnan · Ruslan Salakhutdinov · Pradeep Ravikumar -
2020 Poster: Uniform Convergence of Rank-weighted Learning »
Justin Khim · Liu Leqi · Adarsh Prasad · Pradeep Ravikumar -
2020 Poster: Does label smoothing mitigate label noise? »
Michal Lukasik · Srinadh Bhojanapalli · Aditya Menon · Sanjiv Kumar -
2020 Poster: Low-Rank Bottleneck in Multi-head Attention Models »
Srinadh Bhojanapalli · Chulhee Yun · Ankit Singh Rawat · Sashank Jakkam Reddi · Sanjiv Kumar -
2020 Poster: Sharp Statistical Guaratees for Adversarially Robust Gaussian Classification »
Chen Dan · Yuting Wei · Pradeep Ravikumar -
2020 Poster: Class-Weighted Classification: Trade-offs and Robust Approaches »
Ziyu Xu · Chen Dan · Justin Khim · Pradeep Ravikumar -
2020 Poster: Accelerating Large-Scale Inference with Anisotropic Vector Quantization »
Ruiqi Guo · Philip Sun · Erik Lindgren · Quan Geng · David Simcha · Felix Chern · Sanjiv Kumar -
2020 Poster: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning »
Sai Praneeth Reddy Karimireddy · Satyen Kale · Mehryar Mohri · Sashank Jakkam Reddi · Sebastian Stich · Ananda Theertha Suresh -
2020 Poster: Federated Learning with Only Positive Labels »
Felix Xinnan Yu · Ankit Singh Rawat · Aditya Menon · Sanjiv Kumar -
2020 Poster: Certified Robustness to Label-Flipping Attacks via Randomized Smoothing »
Elan Rosenfeld · Ezra Winston · Pradeep Ravikumar · Zico Kolter -
2019 : Structured matrices for efficient deep learning »
Sanjiv Kumar -
2019 Poster: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Oral: Escaping Saddle Points with Adaptive Gradient Methods »
Matthew Staib · Sashank Jakkam Reddi · Satyen Kale · Sanjiv Kumar · Suvrit Sra -
2019 Poster: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Oral: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2018 Poster: Binary Classification with Karmic, Threshold-Quasi-Concave Metrics »
Bowei Yan · Sanmi Koyejo · Kai Zhong · Pradeep Ravikumar -
2018 Oral: Binary Classification with Karmic, Threshold-Quasi-Concave Metrics »
Bowei Yan · Sanmi Koyejo · Kai Zhong · Pradeep Ravikumar -
2018 Poster: Deep Density Destructors »
David Inouye · Pradeep Ravikumar -
2018 Oral: Deep Density Destructors »
David Inouye · Pradeep Ravikumar -
2017 Poster: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Talk: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Poster: Distributed Mean Estimation with Limited Communication »
Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 Poster: Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP »
Satyen Kale · Zohar Karnin · Tengyuan Liang · David Pal -
2017 Talk: Distributed Mean Estimation with Limited Communication »
Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 Poster: Ordinal Graphical Models: A Tale of Two Approaches »
ARUN SAI SUGGALA · Eunho Yang · Pradeep Ravikumar -
2017 Poster: Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization »
Qi Lei · En-Hsu Yen · Chao-Yuan Wu · Inderjit Dhillon · Pradeep Ravikumar -
2017 Poster: Latent Feature Lasso »
En-Hsu Yen · Wei-Cheng Lee · Sung-En Chang · Arun Suggala · Shou-De Lin · Pradeep Ravikumar -
2017 Talk: Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP »
Satyen Kale · Zohar Karnin · Tengyuan Liang · David Pal -
2017 Talk: Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization »
Qi Lei · En-Hsu Yen · Chao-Yuan Wu · Inderjit Dhillon · Pradeep Ravikumar -
2017 Talk: Ordinal Graphical Models: A Tale of Two Approaches »
ARUN SAI SUGGALA · Eunho Yang · Pradeep Ravikumar -
2017 Talk: Latent Feature Lasso »
En-Hsu Yen · Wei-Cheng Lee · Sung-En Chang · Arun Suggala · Shou-De Lin · Pradeep Ravikumar