Keywords: eXtreme classification multi-class classification multi-label classification large scale learning
Extreme classification is a rapidly growing research area focusing on multi-class and multi-label problems, where the label space is extremely large. It brings many diverse approaches under the same umbrella including natural language processing (NLP), computer vision, information retrieval, recommendation systems, computational advertising, and embedding methods. Extreme classifiers have been deployed in many real-world applications in the industry ranging from language modelling to document tagging in NLP, face recognition to learning universal feature representations in computer vision, etc. Moreover, extreme classification finds application in recommendation, tagging, and ranking systems since these problems can be reformulated as multi-label learning tasks where each item to be ranked or recommended is treated as a separate label. Such reformulations have led to significant gains over traditional collaborative filtering and content-based recommendation techniques.
The proposed workshop aims to offer a timely collection of information to benefit the researchers and practitioners working in the aforementioned research fields of core supervised learning, theory of extreme classification, as well as application domains. These issues are well-covered by the Topics of Interest in ICML 2020. The workshop aims to bring together researchers interested in these areas to encourage discussion, facilitate interaction and collaboration and improve upon the state-of-the-art in extreme classification. The workshop will provide plethora of opportunities for research discussions, including poster sessions, invited talks, contributed talks, and a panel. During the panel the speakers will discuss challenges & opportunities in the field of extreme classification, in particular: 1) how to deal with the long tail labels problem?, 2) how to effectively combine deep learning approaches with extreme multi-label classification techniques?, 3) how to develop the theoretical foundations for this area? We expect a healthy participation from both industry and academia.
Fri 6:00 a.m. - 6:10 a.m.
|
Opening Remarks
(
Talk
)
|
Yashoteja Prabhu · Maryam Majzoubi 🔗 |
Fri 6:10 a.m. - 6:15 a.m.
|
Introduction to Extreme Classification
(
Talk
)
|
Manik Varma · Yashoteja Prabhu 🔗 |
Fri 6:15 a.m. - 6:45 a.m.
|
Invited Talk 1 - DeepXML: A Framework for Deep Extreme Multi-label Learning - Manik Varma
(
Talk
)
In this talk we propose the DeepXML framework for deep extreme multi-label learning and apply it to short-text document classification. We demonstrate that DeepXML can: (a) be used to analyze seemingly disparate deep extreme classifiers; (b) can lead to improvements in leading algorithms such as XML-CNN & MACH when they are recast in the proposed framework; and (c) can lead to a novel algorithm called Astec which can be up to 12% more accurate and up to 40x faster to train than the state-of-the-art for short text document classification. Finally, we show that when flighted on Bing, Astec can be used for personalized search, ads and recommendation for billions of users. Astec can handle billions of events per day, can process more than a hundred thousand events per second and leads to a significant improvement in key metrics as compared to state-of-the-art methods in production. |
Manik Varma 🔗 |
Fri 6:45 a.m. - 6:50 a.m.
|
Invited Talk 1 Q&A - Manik Varma
(
Q&A
)
|
Manik Varma 🔗 |
Fri 6:50 a.m. - 7:20 a.m.
|
Invited Talk 2 - Historical perspective on extreme classification in language modeling - Tomas Mikolov
(
Talk
)
In this talk, I will present several simple ideas that were proposed a long time ago to deal with extremely large output spaces in the language modeling. These include various types of hierarchical softmax, and other approaches that decompose the labels into smaller parts such as sub-word language modeling. |
Tomas Mikolov 🔗 |
Fri 7:20 a.m. - 7:25 a.m.
|
Invited Talk 2 Q&A - Tomas Mikolov
(
Q&A
)
|
Tomas Mikolov 🔗 |
Fri 7:25 a.m. - 7:55 a.m.
|
Break 1
|
🔗 |
Fri 7:55 a.m. - 8:00 a.m.
|
Speaker Introduction
(
Talk
)
|
🔗 |
Fri 8:00 a.m. - 8:30 a.m.
|
Invited Talk 3 - Extreme Classification with Logarithmic-depth Streaming Multi-label Decision Trees - Maryam Majzoubi
(
Talk
)
SlidesLive Video » We consider multi-label classification where the goal is to annotate each data point with the most relevant subset of labels from an extremely large label set. Efficient annotation can be achieved with balanced tree predictors, i.e. trees with logarithmic-depth in the label complexity, whose leaves correspond to labels. Designing prediction mechanism with such trees for real data applications is non-trivial as it needs to accommodate sending examples to multiple leaves while at the same time sustain high prediction accuracy. In this paper we develop the LdSM algorithm for the construction and training of multi-label decision trees, where in every node of the tree we optimize a novel objective function that favors balanced splits, maintains high class purity of children nodes, and allows sending examples to multiple directions but with a penalty that prevents tree over-growth. Each node of the tree is trained once the previous node is completed leading to a streaming approach for training. We analyze the proposed objective theoretically and show that minimizing it leads to pure and balanced data splits. Furthermore, we show a boosting theorem that captures its connection to the multi-label classification error. Experimental results on benchmark data sets demonstrate that our approach achieves high prediction accuracy and low prediction time and position LdSM as a competitive tool among existing state-of-the-art approaches. |
Maryam Majzoubi 🔗 |
Fri 8:30 a.m. - 8:35 a.m.
|
Invited Talk 3 Q&A - Maryam Majzoubi
(
Q&A
)
|
Maryam Majzoubi 🔗 |
Fri 8:35 a.m. - 9:05 a.m.
|
Invited Talk 4 - Contextual Memory Trees - Alina Beygelzimer
(
Talk
)
SlidesLive Video » This talk is about a new learned dynamic memory controller for organizing prior experiences in a way that is empirically useful for a number of downstream tasks. The controller supports logarithmic time operations and can thus be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. It also supports optional reward reinforcement, which brings a steady improvement empirically. The controller operates as a reduction to online classification, allowing it to benefit from advances in representation or architecture. This is joint work with Wen Sun, Hal Daume, John Langford, and Paul Mineiro (published at ICML-2019). |
Alina Beygelzimer 🔗 |
Fri 9:05 a.m. - 9:10 a.m.
|
Invited Talk 4 Q&A - Alina Beygelzimer
(
Q&A
)
|
Alina Beygelzimer 🔗 |
Fri 9:10 a.m. - 10:30 a.m.
|
Lunch Break
|
🔗 |
Fri 10:30 a.m. - 10:35 a.m.
|
Speakers Introduction
(
Talk
)
|
🔗 |
Fri 10:35 a.m. - 10:40 a.m.
|
Spotlight Talk 1 - Unbiased Estimates of Decomposable Losses for Extreme Classification With Missing Labels
(
Spotlight
)
SlidesLive Video » |
Erik Schultheis 🔗 |
Fri 10:40 a.m. - 10:45 a.m.
|
Spotlight Talk 2 - Online probabilistic label trees
(
Spotlight
)
SlidesLive Video » |
Marek Wydmuch 🔗 |
Fri 10:45 a.m. - 10:50 a.m.
|
Spotlight Talk 3 - Visualizing Classification Structure in Large-Scale Classifiers
(
Spotlight
)
|
Bilal Alsallakh 🔗 |
Fri 10:50 a.m. - 10:55 a.m.
|
Spotlight Talk 4 - Generalizing across (in)visible spectrum
(
Spotlight
)
SlidesLive Video » |
Ruchit Rawal 🔗 |
Fri 10:55 a.m. - 11:00 a.m.
|
Spotlight Talk 5 - Extreme Regression for Ranking & Recommendation
(
Spotlight
)
SlidesLive Video » |
Yashoteja Prabhu 🔗 |
Fri 11:00 a.m. - 11:05 a.m.
|
Spotlight Talk 6 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization
(
Spotlight
)
SlidesLive Video » |
Kaidi Cao 🔗 |
Fri 11:05 a.m. - 12:05 p.m.
|
Break 2
|
🔗 |
Fri 11:05 a.m. - 12:05 p.m.
|
Poster Session
(
Poster
)
Use the below Link and Password to joining the Zoom poster sessions. Please do not share the below links/password on any chat or public forum or with any person unregistered for ICML. Password: xc2020p Zoom Links: Poster 1 - Unbiased Estimates of Decomposable Losses for Extreme Classification With Missing Labels [ protected link dropped ] Poster 2 - Online probabilistic label trees [ protected link dropped ] Poster 3 - Visualizing Classification Structure in Large-Scale Classifiers [ protected link dropped ] Poster 4 - Generalizing across (in)visible spectrum [ protected link dropped ] Poster 5 - Extreme Regression for Ranking & Recommendation [ protected link dropped ] Poster 6 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization [ protected link dropped ] |
🔗 |
Fri 12:05 p.m. - 12:10 p.m.
|
Speaker Introduction
(
Talk
)
|
🔗 |
Fri 12:10 p.m. - 12:45 p.m.
|
Invited Talk 5 - Multi-Output Prediction: Theory and Practice - Inderjit Dhillon
(
Talk
)
SlidesLive Video » Many challenging problems in modern applications amount to finding relevant results from an enormous output space of potential candidates, for example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can be in the millions to billions. Moreover, observational or training data is often limited for many of the so-called “long-tail” of items in the output space. Given the inherent paucity of training data for most of the items in the output space, developing machine learned models that perform well for spaces of this size is challenging. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this talk, I will first discuss the challenges in modern multi-output prediction, including missing values, features associated with outputs, absence of negative examples, and the need to scale up to enormous data sets. Bilinear methods, such as Inductive Matrix Completion~(IMC), enable us to handle missing values and output features in practice, while coming with theoretical guarantees. Nonlinear methods such as nonlinear IMC and DSSM (Deep Semantic Similarity Model) enable more powerful models that are used in practice in real-life applications. However, inference in these models scales linearly with the size of the output space. In order to scale up, I will present the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, that performs prediction in three phases: (i) in the first phase, the output space is organized using a semantic indexing scheme, (ii) in the second phase, the indexing is used to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, the matched items are ranked by a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases, and it is possible to ensemble various models, each arising from a particular choice for the three phases. |
Inderjit Dhillon 🔗 |
Fri 12:45 p.m. - 12:50 p.m.
|
Invited Talk 5 Q&A - Inderjit Dhillon
(
Q&A
)
|
Inderjit Dhillon 🔗 |
Fri 12:50 p.m. - 1:20 p.m.
|
Invited Talk 6 - Efficient continuous-action contextual bandits via reduction to extreme multiclass classification - Chicheng Zhang
(
Talk
)
SlidesLive Video » We create a computationally tractable algorithm for contextual bandit learning with one-dimensional continuous actions with unknown structure on the loss functions. In a nutshell, our algorithm, Continuous Action Tree with Smoothing (CATS), reduces continuous-action contextual bandit learning to cost-sensitive extreme multiclass classification, where each class corresponds to a discretized action. We show that CATS admits an online implementation that has low training and test time complexities per example, and enjoys statistical consistency guarantees under certain realizability assumptions. We also verify the efficiency and efficacy of CATS through large-scale experiments. |
Chicheng Zhang 🔗 |
Fri 1:20 p.m. - 1:25 p.m.
|
Invited Talk 6 Q&A - Chicheng Zhang
(
Q&A
)
|
Chicheng Zhang 🔗 |
Fri 1:25 p.m. - 2:10 p.m.
|
Break 3
|
🔗 |
Fri 1:25 p.m. - 2:10 p.m.
|
Poster Session
(
Poster
)
Use the below Link and Password to joining the Zoom poster sessions. Please do not share the below links/password on any chat or public forum or with any person unregistered for ICML. Password: xc2020p Zoom Links: Poster 1 - Unbiased Estimates of Decomposable Losses for Extreme Classification With Missing Labels [ protected link dropped ] Poster 2 - Online probabilistic label trees [ protected link dropped ] Poster 3 - Visualizing Classification Structure in Large-Scale Classifiers [ protected link dropped ] Poster 4 - Generalizing across (in)visible spectrum [ protected link dropped ] Poster 5 - Extreme Regression for Ranking & Recommendation [ protected link dropped ] Poster 6 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization [ protected link dropped ] |
🔗 |
Fri 2:10 p.m. - 2:15 p.m.
|
Speaker Introduction
(
Talk
)
|
🔗 |
Fri 2:15 p.m. - 2:45 p.m.
|
Invited Talk 7 - Generalizing to Novel Tasks in the Low-Data Regime - Jure Leskovec
(
Talk
)
SlidesLive Video » Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. In this talk I will discuss how meta-learning methods can improve generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space. We learn mappings of high-level concepts into semi-structured metric spaces, and effectively combine the outputs of independent concept learners. Experiments on diverse domains, including a benchmark image classification dataset and a novel single-cell dataset from a biological domain show significant gains over strong meta-learning baselines. |
Jure Leskovec 🔗 |
Fri 2:45 p.m. - 2:50 p.m.
|
Invited Talk 7 Q&A - Jure Leskovec
(
Q&A
)
|
Jure Leskovec 🔗 |
Fri 2:50 p.m. - 4:00 p.m.
|
Discussion Panel
|
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford 🔗 |