Subset Selection in Machine Learning: From Theory to Applications

Workshop

Subset Selection in Machine Learning: From Theory to Applications

Rishabh Iyer · Abir De · Ganesh Ramakrishnan · Jeff Bilmes

Sat 24 Jul, 6 a.m. PDT

[ Abstract ] Workshop Website

A growing number of machine learning problems involve finding subsets of data points. Examples range from selecting subset of labeled or unlabeled data points, to subsets of features or model parameters, to selecting subsets of pixels, keypoints, sentences etc. in image segmentation, correspondence and summarization problems. The workshop would encompass a wide variety of topics ranging from theoretical aspects of subset selection e.g. coresets, submodularity, determinantal point processes, to several practical applications, {\em e.g.}, time and energy efficient learning, learning under resource constraints, active learning, human assisted learning, feature selection, model compression, feature induction, {\em etc.}

We believe that this workshop is very timely since, a) subset selection is naturally emerging and has often been considered in isolation in many of the above applications, and b) by connecting researchers working on both the theoretical and application domains above, we can foster a much needed discussion on reusing a several technical innovations across these subareas and applications. Furthermore, we would also like to connect researchers working on the theoretical foundations of subset selection (in areas such as coresets and submodularity) with researchers working in applications (such as feature selection, active learning, data efficient learning, model compression, and human assisted machine learning).

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 6:15 a.m. - 6:30 a.m.	Introduction by the Organizers ( Live Intro ) > SlidesLive Video	Abir De · Rishabh Iyer · Ganesh Ramakrishnan · Jeff Bilmes 🔗
Sat 6:30 a.m. - 7:00 a.m.	Introduction to Coresets and Open Problems ( Invited Talk ) > SlidesLive Video	Dan Feldman 🔗
Sat 7:00 a.m. - 7:25 a.m.	Differentiable learning Under Algorithmic Triage ( Invited Talk ) > SlidesLive Video	Manuel Gomez-Rodriguez 🔗
Sat 7:25 a.m. - 7:30 a.m.	Differentiable learning Under Algorithmic Triage Q&A ( Live Q&A ) >	🔗
Sat 7:30 a.m. - 7:55 a.m.	Data Summarization via Bilevel Coresets ( Invited Talk ) > SlidesLive Video	Andreas Krause 🔗
Sat 7:55 a.m. - 8:00 a.m.	Data Summarization via Bilevel Coresets: Live Q&A ( Live Q&A ) >	🔗
Sat 8:00 a.m. - 8:25 a.m.	Learning Constraints from Examples ( Invited Talk ) > SlidesLive Video	Luc De Raedt 🔗
Sat 8:25 a.m. - 8:30 a.m.	Learning Constraints from Examples Live Q&A ( Live Q&A ) >	🔗
Sat 8:30 a.m. - 8:51 a.m.	Greedy and Its Friends ( Invited Talk ) > SlidesLive Video	Amin Karbasi 🔗
Sat 8:51 a.m. - 9:00 a.m.	Greedy and Its Friends Live Q&A ( Live Q&A ) >	🔗
Sat 9:00 a.m. - 9:30 a.m.	Poster Session 1 ( Poster Session ) >	🔗
Sat 9:30 a.m. - 10:30 a.m.	Panel Discussion on Subset Selection for ML Problems in the Real World (Speakers, Organizers, and a few more invited panelists) ( Panel Discussion ) > SlidesLive Video	🔗
Sat 10:30 a.m. - 10:44 a.m.	Benchmarks and Toolkits for Data Subset Selection in ML through DECILE: Part I ( Invited Talk ) > link SlidesLive Video Link	Rishabh Iyer 🔗
Sat 10:44 a.m. - 10:58 a.m.	Benchmarks and Toolkits for Data Subset Selection in ML through DECILE: Part II ( Invited Talk ) > link SlidesLive Video Link	Ganesh Ramakrishnan 🔗
Sat 10:58 a.m. - 11:00 a.m.	Benchmarks and Toolkits for Data Subset Selection in ML through DECILE: Live Q&A ( Live Q&A ) >	🔗
Sat 11:00 a.m. - 11:20 a.m.	More Information, Less Data ( Invited Talk ) > link SlidesLive Video Link	Jeff Bilmes 🔗
Sat 11:20 a.m. - 11:30 a.m.	More Information, Less Data: Q&A Session ( Live Q&A ) >	🔗
Sat 11:30 a.m. - 11:50 a.m.	Theory of feature selection ( Invited Talk ) > SlidesLive Video	Rajiv Khanna 🔗
Sat 11:50 a.m. - 12:00 p.m.	Theory of feature selection Live Q&A ( Live Q&A ) >	🔗
Sat 12:00 p.m. - 12:04 p.m.	Online and Non Parametric Coresets for Bregman Divergence ( Spotlight ) > SlidesLive Video	Supratim Shit · Rachit Chhaya · Anirban Dasgupta · Jayesh Choudhari 🔗
Sat 12:04 p.m. - 12:09 p.m.	Unconstrained Submodular Maximization with Modular Costs: Tight Approximation and Application to Profit Maximization ( Spotlight ) > SlidesLive Video	Tianyuan Jin · Yu Yang · Renchi Yang · Jieming Shi · Keke Huang · Xiaokui Xiao 🔗
Sat 12:09 p.m. - 12:14 p.m.	SVP-CF: Selection via Proxy for Collaborative Filtering Data ( Spotlight ) > SlidesLive Video	Noveen Sachdeva · Julian McAuley · Carole-Jean Wu 🔗
Sat 12:14 p.m. - 12:19 p.m.	Bayesian decision analysis for collecting nearly-optimal subsets ( Spotlight ) > SlidesLive Video	Daniel Kowal 🔗
Sat 12:19 p.m. - 12:23 p.m.	Fast Estimation Method for the Stability of Ensemble Feature Selectors ( Spotlight ) > SlidesLive Video	Rina Onda · Kenta Oono 🔗
Sat 12:23 p.m. - 12:27 p.m.	Selective Focusing Learning in Conditional GANs ( Spotlight ) > SlidesLive Video	Kyeongbo Kong · Kyunghun Kim · Woo-jin Song · Suk-Ju Kang 🔗
Sat 12:27 p.m. - 12:32 p.m.	Kernel Thinning ( Spotlight ) > SlidesLive Video	Raaz Dwivedi · Lester Mackey 🔗
Sat 12:32 p.m. - 12:37 p.m.	Multiple-criteria Based Active Learning with Fixed-size Determinantal Point Processes ( Spotlight ) > SlidesLive Video	Xueying ZHAN · Qing Li · Antoni Chan 🔗
Sat 12:37 p.m. - 12:42 p.m.	Coresets for Classification – Simplified and Strengthened ( Spotlight ) > SlidesLive Video	Anup Rao · Tung Mai · Cameron Musco 🔗
Sat 12:42 p.m. - 12:47 p.m.	Using Machine Learning to Recognise Statistical Dependence ( Spotlight ) > SlidesLive Video	Ubai Sandouk 🔗
Sat 12:47 p.m. - 12:52 p.m.	Sparsifying Transformer Models with Trainable Representation Pooling ( Spotlight ) > SlidesLive Video	Michał Pietruszka · Łukasz Borchmann · Łukasz Garncarek 🔗
Sat 12:52 p.m. - 12:57 p.m.	Continual Learning via Function-Space Variational Inference: A Unifying View ( Spotlight ) > SlidesLive Video	Tim G. J. Rudner · Freddie Bickford Smith · Qixuan Feng · Yee-Whye Teh · Yarin Gal 🔗
Sat 12:57 p.m. - 1:02 p.m.	Active Learning under Pool Set Distribution Shift and Noisy Data ( Spotlight ) > SlidesLive Video	Andreas Kirsch · Tom Rainforth · Yarin Gal 🔗
Sat 1:02 p.m. - 1:07 p.m.	Sparse Bayesian Learning via Stepwise Regression ( Spotlight ) > SlidesLive Video	Sebastian Ament · Carla Gomes 🔗
Sat 1:07 p.m. - 1:10 p.m.	Mitigating Memorization in Sample Selection for Learning with Noisy Labels ( Spotlight ) > SlidesLive Video	Kyeongbo Kong · Junggi Lee · Youngchul Kwak · Young-Rae Cho · Seong-Eun Kim · Woo-jin Song 🔗
Sat 1:10 p.m. - 1:59 p.m.	Poster Session 2 ( Poster Session ) >	🔗
Sat 1:59 p.m. - 2:00 p.m.	Introduction to Invited Talk ( Live Intro ) >	🔗
Sat 2:00 p.m. - 2:29 p.m.	Data-efficient and Robust Learning from Massive Datasets ( Invited Talk ) > SlidesLive Video	Baharan Mirzasoleiman 🔗
Sat 2:29 p.m. - 2:30 p.m.	Data-efficient and Robust Learning from Massive Datasets Live Q&A ( Live Q&A ) >	🔗
Sat 2:30 p.m. - 2:50 p.m.	Computationally Efficient Data Selection for Deep Learning ( Invited Talk ) > SlidesLive Video	Cody Coleman 🔗
Sat 2:50 p.m. - 3:00 p.m.	Computationally Efficient Data Selection for Deep Learning Live Q&A ( Live Q&A ) >	🔗
Sat 3:00 p.m. - 3:05 p.m.	High-Dimensional Variable Selection and Non-Linear Interaction Discovery in Linear Time ( Spotlight ) > SlidesLive Video	Raj Agrawal · Tamara Broderick 🔗
Sat 3:05 p.m. - 3:10 p.m.	Error-driven Fixed-Budget ASR Personalization for Accented Speakers ( Spotlight ) > SlidesLive Video	Abhijeet Awasthi · Sunita Sarawagi · Preethi Jyothi 🔗
Sat 3:10 p.m. - 3:15 p.m.	MISNN: Multiple Imputation via Semi-parametric Neural Networks ( Spotlight ) > SlidesLive Video	Zhiqi Bu · Zongyu Dai · Yiliang Zhang · Qi Long 🔗
Sat 3:15 p.m. - 3:20 p.m.	Towards Active Air Quality Station Deployment ( Spotlight ) > SlidesLive Video	Zeel B Patel · Nipun Batra 🔗
Sat 3:20 p.m. - 3:25 p.m.	Core-set Sampling for Efficient Neural Architecture Search ( Spotlight ) > SlidesLive Video	Jae-hun Shim · Kyeongbo Kong · Suk-Ju Kang 🔗
Sat 3:25 p.m. - 3:30 p.m.	On Coresets For Fair Regression ( Spotlight ) > SlidesLive Video	Rachit Chhaya · Anirban Dasgupta · Supratim Shit · Jayesh Choudhari 🔗
Sat 3:30 p.m. - 3:35 p.m.	A Comparison of Contextual and Non-Contextual Preference Ranking for Set Addition Problems ( Spotlight ) > SlidesLive Video	Timo Bertram · Johannes Fürnkranz · Martin Müller 🔗
Sat 3:35 p.m. - 3:40 p.m.	Statistical Measures For Defining Curriculum Scoring Function ( Spotlight ) > SlidesLive Video	Vinu Sankar Sadasivan · Anirban Dasgupta 🔗
Sat 3:40 p.m. - 3:45 p.m.	An Extreme Point Approach to Subset Selection ( Spotlight ) > SlidesLive Video	Viveck Cadambe · Bill Kay 🔗
Sat 3:45 p.m. - 3:50 p.m.	Tighter m-DPP Coreset Sample Complexity Bounds ( Spotlight ) > SlidesLive Video	Gantavya Bhatt · Jeff Bilmes 🔗
Sat 3:50 p.m. - 3:55 p.m.	SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios ( Spotlight ) > SlidesLive Video	Suraj Kothawade · Krishnateja Killamsetty · Rishabh Iyer 🔗
Sat 3:55 p.m. - 3:59 p.m.	Minimax Optimization: The Case of Convex-Submodular ( Spotlight ) > SlidesLive Video	Arman Adibi · Aryan Mokhtari · Hamed Hassani 🔗
Sat 3:59 p.m. - 4:04 p.m.	Improved Regret Bounds for Online Submodular Maximization ( Spotlight ) > SlidesLive Video	Omid Sadeghi · Maryam Fazel 🔗
Sat 4:04 p.m. - 4:09 p.m.	Differentially Private Monotone Submodular Maximization Under Matroid and Knapsack Constraints ( Spotlight ) > SlidesLive Video	Omid Sadeghi · Maryam Fazel 🔗
Sat 4:09 p.m. - 4:14 p.m.	Effective Evaluation of Deep Active Learning on Image Classification Tasks ( Spotlight ) > SlidesLive Video	Nathan Beck · Durga Sivasubramanian · Ganesh Ramakrishnan · Rishabh Iyer 🔗
Sat 4:14 p.m. - 4:19 p.m.	Active Learning Convex Halfspaces on Graphs ( Spotlight ) > SlidesLive Video	Maximilian Thiessen · Thomas Gärtner 🔗
Sat 4:19 p.m. - 4:23 p.m.	Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity ( Spotlight ) > SlidesLive Video	Praneeth Vepakomma · Ramesh Raskar 🔗
Sat 4:23 p.m. - 4:30 p.m.	Concluding Remarks ( Live Intro ) > SlidesLive Video	🔗
-	Data efficiency in graph networks through equivariance ( Poster ) >	Francesco Farina · Emma Slade 🔗
-	SubsetGAN: Pattern detection in the activation space for Identifying Synthesised Content ( Poster ) >	Celia Cintas · Skyler Speakman · Girmaw Abebe Tadesse · Victor Akinwande · Kommy Weldemariam 🔗
-	Ordinal Embedding for Sets ( Poster ) >	Aissatou Diallo · Johannes Fürnkranz 🔗
-	Differentiable architecture pruning for transfer learning ( Poster ) >	Nicolo Colombo · Yang Gao 🔗
-	When does loss-based prioritization fail? ( Poster ) >	Niel Hu · Xinyu Hu · Rosanne Liu · Sara Hooker · Jason Yosinski 🔗
-	Geometrical Homogeneous Clustering for Image Data Reduction ( Poster ) >	Shril Mody · Janvi Thakkar · Devvrat Joshi · Siddharth Soni · Nipun Batra · Rohan Patil 🔗
-	Interactive Teaching for Imbalanced Data Summarization ( Poster ) >	Farhad Pourkamali-Anaraki · Walter Bennette 🔗
-	A Practical Notation for Information-Theoretic Quantities between Outcomes and Random Variables ( Poster ) >	Andreas Kirsch · Yarin Gal 🔗
-	Learning to Delegate for Large-scale Vehicle Routing ( Poster ) >	Sirui Li · Zhongxia Yan · Cathy Wu 🔗
-	Multi-objective diversification via Submodular Counterfactual Scoring for Track Sequencing on Spotify ( Poster ) >	Rishabh Mehrotra 🔗
-	A Data Subset Selection Framework for Efficient Hyper-Parameter Tuning and Automatic Machine Learning ( Poster ) >	Savan Amitbhai Visalpara · Krishnateja Killamsetty · Rishabh Iyer 🔗
-	GoldiProx Selection: Faster training by learning what is learnable, not yet learned, and worth learning ( Poster ) >	Sören Mindermann · Muhammed Razzak · Adrien Morisot · Aidan Gomez · Sebastian Farquhar · Jan Brauner · Yarin Gal 🔗
-	Online and Non Parametric Coresets for Bregman Divergence ( Poster ) >	Supratim Shit · Rachit Chhaya · Anirban Dasgupta · Jayesh Choudhari 🔗
-	Unconstrained Submodular Maximization with Modular Costs: Tight Approximation and Application to Profit Maximization ( Poster ) >	Tianyuan Jin · Yu Yang · Renchi Yang · Jieming Shi · Keke Huang · Xiaokui Xiao 🔗
-	SVP-CF: Selection via Proxy for Collaborative Filtering Data ( Poster ) >	Noveen Sachdeva · Julian McAuley · Carole-Jean Wu 🔗
-	Bayesian decision analysis for collecting nearly-optimal subsets ( Poster ) >	Daniel Kowal 🔗
-	Fast Estimation Method for the Stability of Ensemble Feature Selectors ( Poster ) >	Rina Onda · Kenta Oono 🔗
-	Selective Focusing Learning in Conditional GANs ( Poster ) >	Kyeongbo Kong · Kyunghun Kim · Woo-jin Song · Suk-Ju Kang 🔗
-	Kernel Thinning ( Poster ) >	Raaz Dwivedi · Lester Mackey 🔗
-	Multiple-criteria Based Active Learning with Fixed-size Determinantal Point Processes ( Poster ) >	Xueying ZHAN · Qing Li · Antoni Chan 🔗
-	Coresets for Classification – Simplified and Strengthened ( Poster ) >	Anup Rao · Tung Mai · Cameron Musco 🔗
-	Using Machine Learning to Recognise Statistical Dependence ( Poster ) >	Ubai Sandouk 🔗
-	Mitigating Memorization in Sample Selection for Learning with Noisy Labels ( Poster ) >	Kyeongbo Kong · Junggi Lee · Youngchul Kwak · Young-Rae Cho · Seong-Eun Kim · Woo-jin Song 🔗
-	Sparsifying Transformer Models with Trainable Representation Pooling ( Poster ) >	Michał Pietruszka · Łukasz Borchmann · Łukasz Garncarek 🔗
-	Continual Learning via Function-Space Variational Inference: A Unifying View ( Poster ) >	Tim G. J. Rudner · Freddie Bickford Smith · Qixuan Feng · Yee-Whye Teh · Yarin Gal 🔗
-	Active Learning under Pool Set Distribution Shift and Noisy Data ( Poster ) >	Andreas Kirsch · Tom Rainforth · Yarin Gal 🔗
-	Batch Active Learning with Stochastic Acquisition Functions ( Poster ) >	Andreas Kirsch · Sebastian Farquhar · Yarin Gal 🔗
-	Sparse Bayesian Learning via Stepwise Regression ( Poster ) >	Sebastian Ament · Carla Gomes 🔗
-	High-Dimensional Variable Selection and Non-Linear Interaction Discovery in Linear Time ( Poster ) >	Raj Agrawal · Tamara Broderick 🔗
-	Error-driven Fixed-Budget ASR Personalization for Accented Speakers ( Poster ) >	Abhijeet Awasthi · Sunita Sarawagi · Preethi Jyothi 🔗
-	MISNN: Multiple Imputation via Semi-parametric Neural Networks ( Poster ) >	Zhiqi Bu · Zongyu Dai · Yiliang Zhang · Qi Long 🔗
-	Towards Active Air Quality Station Deployment ( Poster ) >	Zeel B Patel · Nipun Batra 🔗
-	Core-set Sampling for Efficient Neural Architecture Search ( Poster ) >	Jae-hun Shim · Kyeongbo Kong · Suk-Ju Kang 🔗
-	On Coresets For Fair Regression ( Poster ) >	Rachit Chhaya · Anirban Dasgupta · Supratim Shit · Jayesh Choudhari 🔗
-	A Comparison of Contextual and Non-Contextual Preference Ranking for Set Addition Problems ( Poster ) >	Timo Bertram · Johannes Fürnkranz · Martin Müller 🔗
-	Statistical Measures For Defining Curriculum Scoring Function ( Poster ) >	Vinu Sankar Sadasivan · Anirban Dasgupta 🔗
-	An Extreme Point Approach to Subset Selection ( Poster ) >	Viveck Cadambe · Bill Kay 🔗
-	Tighter m-DPP Coreset Sample Complexity Bounds ( Poster ) >	Gantavya Bhatt · Jeff Bilmes 🔗
-	SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios ( Poster ) >	Suraj Kothawade · Krishnateja Killamsetty · Rishabh Iyer 🔗
-	Minimax Optimization: The Case of Convex-Submodular ( Poster ) >	Arman Adibi · Aryan Mokhtari · Hamed Hassani 🔗
-	Improved Regret Bounds for Online Submodular Maximization ( Poster ) >	Omid Sadeghi · Maryam Fazel 🔗
-	Differentially Private Monotone Submodular Maximization Under Matroid and Knapsack Constraints ( Poster ) >	Omid Sadeghi · Maryam Fazel 🔗
-	Effective Evaluation of Deep Active Learning on Image Classification Tasks ( Poster ) >	Nathan Beck · Durga Sivasubramanian · Ganesh Ramakrishnan · Rishabh Iyer 🔗
-	Active Learning Convex Halfspaces on Graphs ( Poster ) >	Maximilian Thiessen · Thomas Gärtner 🔗
-	Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity ( Poster ) >	Praneeth Vepakomma · Ramesh Raskar 🔗