Timezone: »

Loss Decomposition for Fast Learning in Large Output Spaces
En-Hsu Yen · Satyen Kale · Felix Xinnan Yu · Daniel Holtmann-Rice · Sanjiv Kumar · Pradeep Ravikumar

Thu Jul 12 05:50 AM -- 06:00 AM (PDT) @ K11

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.

Author Information

En-Hsu Yen (Carnegie Mellon University)

I am currently a PhD student in the Computer Science School of Carnegie Mellon University (Machine Learning Department), working with Pradeep Ravikumar and Inderjit Dhillon. I received my B.S./B.B.A/M.S. from CSIE/IM departments of National Taiwan University, where I worked with Shou-De Lin. My research focuses on Large-Scale Machine Learning, Convex Optimization and their applications.

Satyen Kale (Google Research)
Felix Xinnan Yu (Google AI)
Daniel Holtmann-Rice (Google Inc)
Sanjiv Kumar (Google Research, NY)
Pradeep Ravikumar (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors