Timezone: »

 
Workshop
Workshop on eXtreme Classification: Theory and Applications
Anna Choromanska · John Langford · Maryam Majzoubi · Yashoteja Prabhu

Fri Jul 17 06:00 AM -- 04:00 PM (PDT) @ None
Event URL: http://manikvarma.org/events/XC20/index.html »

Extreme classification is a rapidly growing research area focusing on multi-class and multi-label problems, where the label space is extremely large. It brings many diverse approaches under the same umbrella including natural language processing (NLP), computer vision, information retrieval, recommendation systems, computational advertising, and embedding methods. Extreme classifiers have been deployed in many real-world applications in the industry ranging from language modelling to document tagging in NLP, face recognition to learning universal feature representations in computer vision, etc. Moreover, extreme classification finds application in recommendation, tagging, and ranking systems since these problems can be reformulated as multi-label learning tasks where each item to be ranked or recommended is treated as a separate label. Such reformulations have led to significant gains over traditional collaborative filtering and content-based recommendation techniques.

The proposed workshop aims to offer a timely collection of information to benefit the researchers and practitioners working in the aforementioned research fields of core supervised learning, theory of extreme classification, as well as application domains. These issues are well-covered by the Topics of Interest in ICML 2020. The workshop aims to bring together researchers interested in these areas to encourage discussion, facilitate interaction and collaboration and improve upon the state-of-the-art in extreme classification. The workshop will provide plethora of opportunities for research discussions, including poster sessions, invited talks, contributed talks, and a panel. During the panel the speakers will discuss challenges & opportunities in the field of extreme classification, in particular: 1) how to deal with the long tail labels problem?, 2) how to effectively combine deep learning approaches with extreme multi-label classification techniques?, 3) how to develop the theoretical foundations for this area? We expect a healthy participation from both industry and academia.

Fri 6:00 a.m. - 6:10 a.m. [iCal]
Opening Remarks (Talk)
Yashoteja Prabhu, Maryam Majzoubi
Fri 6:10 a.m. - 6:15 a.m. [iCal]
Introduction to Extreme Classification (Talk)
Manik Varma, Yashoteja Prabhu
Fri 6:15 a.m. - 6:45 a.m. [iCal]

In this talk we propose the DeepXML framework for deep extreme multi-label learning and apply it to short-text document classification. We demonstrate that DeepXML can: (a) be used to analyze seemingly disparate deep extreme classifiers; (b) can lead to improvements in leading algorithms such as XML-CNN & MACH when they are recast in the proposed framework; and (c) can lead to a novel algorithm called Astec which can be up to 12% more accurate and up to 40x faster to train than the state-of-the-art for short text document classification. Finally, we show that when flighted on Bing, Astec can be used for personalized search, ads and recommendation for billions of users. Astec can handle billions of events per day, can process more than a hundred thousand events per second and leads to a significant improvement in key metrics as compared to state-of-the-art methods in production.

Manik Varma
Fri 6:45 a.m. - 6:50 a.m. [iCal]
Invited Talk 1 Q&A - Manik Varma (Q&A)
Manik Varma
Fri 6:50 a.m. - 7:20 a.m. [iCal]

In this talk, I will present several simple ideas that were proposed a long time ago to deal with extremely large output spaces in the language modeling. These include various types of hierarchical softmax, and other approaches that decompose the labels into smaller parts such as sub-word language modeling.

Tomas Mikolov
Fri 7:20 a.m. - 7:25 a.m. [iCal]
Invited Talk 2 Q&A - Tomas Mikolov (Q&A)
Tomas Mikolov
Fri 7:25 a.m. - 7:55 a.m. [iCal]
Break 1 (Break)
Fri 7:55 a.m. - 8:00 a.m. [iCal]
Speaker Introduction (Talk)
Fri 8:00 a.m. - 8:30 a.m. [iCal]

We consider multi-label classification where the goal is to annotate each data point with the most relevant subset of labels from an extremely large label set. Efficient annotation can be achieved with balanced tree predictors, i.e. trees with logarithmic-depth in the label complexity, whose leaves correspond to labels. Designing prediction mechanism with such trees for real data applications is non-trivial as it needs to accommodate sending examples to multiple leaves while at the same time sustain high prediction accuracy. In this paper we develop the LdSM algorithm for the construction and training of multi-label decision trees, where in every node of the tree we optimize a novel objective function that favors balanced splits, maintains high class purity of children nodes, and allows sending examples to multiple directions but with a penalty that prevents tree over-growth. Each node of the tree is trained once the previous node is completed leading to a streaming approach for training. We analyze the proposed objective theoretically and show that minimizing it leads to pure and balanced data splits. Furthermore, we show a boosting theorem that captures its connection to the multi-label classification error. Experimental results on benchmark data sets demonstrate that our approach achieves high prediction accuracy and low prediction time and position LdSM as a competitive tool among existing state-of-the-art approaches.

Maryam Majzoubi
Fri 8:30 a.m. - 8:35 a.m. [iCal]
Invited Talk 3 Q&A - Maryam Majzoubi (Q&A)
Maryam Majzoubi
Fri 8:35 a.m. - 9:05 a.m. [iCal]

This talk is about a new learned dynamic memory controller for organizing prior experiences in a way that is empirically useful for a number of downstream tasks. The controller supports logarithmic time operations and can thus be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. It also supports optional reward reinforcement, which brings a steady improvement empirically. The controller operates as a reduction to online classification, allowing it to benefit from advances in representation or architecture. This is joint work with Wen Sun, Hal Daume, John Langford, and Paul Mineiro (published at ICML-2019).

Alina Beygelzimer
Fri 9:05 a.m. - 9:10 a.m. [iCal]
Invited Talk 4 Q&A - Alina Beygelzimer (Q&A)
Alina Beygelzimer
Fri 9:10 a.m. - 10:30 a.m. [iCal]
Lunch Break (Break)
Fri 10:30 a.m. - 10:35 a.m. [iCal]
Speakers Introduction (Talk)
Fri 10:35 a.m. - 10:40 a.m. [iCal]
Spotlight Talk 1 - Unbiased Estimates of Decomposable Losses for Extreme Classification With Missing Labels (Spotlight)
Erik Schultheis
Fri 10:40 a.m. - 10:45 a.m. [iCal]
Spotlight Talk 2 - Online probabilistic label trees (Spotlight)
Marek Wydmuch
Fri 10:45 a.m. - 10:50 a.m. [iCal]
Spotlight Talk 3 - Visualizing Classification Structure in Large-Scale Classifiers (Spotlight)
Bilal Alsallakh
Fri 10:50 a.m. - 10:55 a.m. [iCal]
Spotlight Talk 4 - Generalizing across (in)visible spectrum (Spotlight)
Ruchit Rawal
Fri 10:55 a.m. - 11:00 a.m. [iCal]
Spotlight Talk 5 - Extreme Regression for Ranking & Recommendation (Spotlight)
Yashoteja Prabhu
Fri 11:00 a.m. - 11:05 a.m. [iCal]
Spotlight Talk 6 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization (Spotlight)
Kaidi Cao
Fri 11:05 a.m. - 12:05 p.m. [iCal]
Break 2 (Break)
Fri 11:05 a.m. - 12:05 p.m. [iCal]

Use the below Link and Password to joining the Zoom poster sessions. Please do not share the below links/password on any chat or public forum or with any person unregistered for ICML.

Password: xc2020p

Zoom Links:

Poster 1 - Unbiased Estimates of Decomposable Losses for Extreme Classification With Missing Labels

https://zoom.us/j/98614464694?pwd=bHorMlRZRmNkNXUvTnNjT3hKbnZkdz09


Poster 2 - Online probabilistic label trees

https://zoom.us/j/98050411730?pwd=VGxoakZsaWNLZFptZ3VrbzBKU1RRZz09


Poster 3 - Visualizing Classification Structure in Large-Scale Classifiers

https://zoom.us/j/92665878817?pwd=OE9MTG1od1lDMFpSa0Vtb3FOUjB3dz09


Poster 4 - Generalizing across (in)visible spectrum

https://zoom.us/j/97890519071?pwd=MVliMWpaSW1IazVaZXlqQjA2Zmp2Zz09


Poster 5 - Extreme Regression for Ranking & Recommendation

https://zoom.us/j/96685271629?pwd=MGdEN2k1YmtnMjZhUmNUZ241S2FFZz09


Poster 6 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

https://zoom.us/j/92929137916?pwd=NmZiZ28veUF2SW93VDdrVEJrN1BZZz09

Fri 12:05 p.m. - 12:10 p.m. [iCal]
Speaker Introduction (Talk)
Fri 12:10 p.m. - 12:45 p.m. [iCal]

Many challenging problems in modern applications amount to finding relevant results from an enormous output space of potential candidates, for example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can be in the millions to billions. Moreover, observational or training data is often limited for many of the so-called “long-tail” of items in the output space. Given the inherent paucity of training data for most of the items in the output space, developing machine learned models that perform well for spaces of this size is challenging. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this talk, I will first discuss the challenges in modern multi-output prediction, including missing values, features associated with outputs, absence of negative examples, and the need to scale up to enormous data sets. Bilinear methods, such as Inductive Matrix Completion~(IMC), enable us to handle missing values and output features in practice, while coming with theoretical guarantees. Nonlinear methods such as nonlinear IMC and DSSM (Deep Semantic Similarity Model) enable more powerful models that are used in practice in real-life applications. However, inference in these models scales linearly with the size of the output space. In order to scale up, I will present the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, that performs prediction in three phases: (i) in the first phase, the output space is organized using a semantic indexing scheme, (ii) in the second phase, the indexing is used to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, the matched items are ranked by a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases, and it is possible to ensemble various models, each arising from a particular choice for the three phases.

Inderjit Dhillon
Fri 12:45 p.m. - 12:50 p.m. [iCal]
Invited Talk 5 Q&A - Inderjit Dhillon (Q&A)
Inderjit Dhillon
Fri 12:50 p.m. - 1:20 p.m. [iCal]

We create a computationally tractable algorithm for contextual bandit learning with one-dimensional continuous actions with unknown structure on the loss functions. In a nutshell, our algorithm, Continuous Action Tree with Smoothing (CATS), reduces continuous-action contextual bandit learning to cost-sensitive extreme multiclass classification, where each class corresponds to a discretized action. We show that CATS admits an online implementation that has low training and test time complexities per example, and enjoys statistical consistency guarantees under certain realizability assumptions. We also verify the efficiency and efficacy of CATS through large-scale experiments.

Chicheng Zhang
Fri 1:20 p.m. - 1:25 p.m. [iCal]
Invited Talk 6 Q&A - Chicheng Zhang (Q&A)
Chicheng Zhang
Fri 1:25 p.m. - 2:10 p.m. [iCal]
Break 3 (Break)
Fri 1:25 p.m. - 2:10 p.m. [iCal]

Use the below Link and Password to joining the Zoom poster sessions. Please do not share the below links/password on any chat or public forum or with any person unregistered for ICML.

Password: xc2020p

Zoom Links:

Poster 1 - Unbiased Estimates of Decomposable Losses for Extreme Classification With Missing Labels

https://zoom.us/j/98614464694?pwd=bHorMlRZRmNkNXUvTnNjT3hKbnZkdz09


Poster 2 - Online probabilistic label trees

https://zoom.us/j/98050411730?pwd=VGxoakZsaWNLZFptZ3VrbzBKU1RRZz09


Poster 3 - Visualizing Classification Structure in Large-Scale Classifiers

https://zoom.us/j/92665878817?pwd=OE9MTG1od1lDMFpSa0Vtb3FOUjB3dz09


Poster 4 - Generalizing across (in)visible spectrum

https://zoom.us/j/97890519071?pwd=MVliMWpaSW1IazVaZXlqQjA2Zmp2Zz09


Poster 5 - Extreme Regression for Ranking & Recommendation

https://zoom.us/j/96685271629?pwd=MGdEN2k1YmtnMjZhUmNUZ241S2FFZz09


Poster 6 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

https://zoom.us/j/92929137916?pwd=NmZiZ28veUF2SW93VDdrVEJrN1BZZz09

Fri 2:10 p.m. - 2:15 p.m. [iCal]
Speaker Introduction (Talk)
Fri 2:15 p.m. - 2:45 p.m. [iCal]

Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. In this talk I will discuss how meta-learning methods can improve generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space. We learn mappings of high-level concepts into semi-structured metric spaces, and effectively combine the outputs of independent concept learners. Experiments on diverse domains, including a benchmark image classification dataset and a novel single-cell dataset from a biological domain show significant gains over strong meta-learning baselines.

Jure Leskovec
Fri 2:45 p.m. - 2:50 p.m. [iCal]
Invited Talk 7 Q&A - Jure Leskovec (Q&A)
Jure Leskovec
Fri 2:50 p.m. - 4:00 p.m. [iCal]
Discussion Panel
Krzysztof Dembczynski, Prateek Jain, Alina Beygelzimer, Inderjit Dhillon, Anna Choromanska, Maryam Majzoubi, Yashoteja Prabhu, John Langford

Author Information

Anna Choromanska (NYU Tandon School of Engineering)
John Langford (Microsoft Research)
Maryam Majzoubi (New York University)
Yashoteja Prabhu (Microsoft Research India)

More from the Same Authors