Timezone: »
A coreset is a small weighted subset of an input set that approximates its loss function, for a given set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many applications. Unfortunately, coresets are constructed in a problem-dependent manner, where for each problem, a new coreset construction algorithm is suggested, taking years to prove its correctness. Even the generic frameworks require additional (problem-dependent) computations or proofs to be done by the user. Besides, many problems do not have (provable) small coresets, limiting their applicability. To this end, we suggest an automatic practical framework for constructing coresets, which requires (only) the input data and the desired cost function from the user, without the need for any other task-related computation to be done by the user. To do so, we reduce the problem of approximating a loss function to an instance of vector summation approximation, where the vectors we aim to sum are loss vectors of a specific subset of the queries, such that we aim to approximate the image of the function on this subset. We show that while this set is limited, the coreset is quite general. An extensive experimental study on various machine learning applications is also conducted. Finally, we provide a ``plug and play" style implementation, proposing a user-friendly system that can be easily used to apply coresets for many problems. We believe that these contributions enable future research and easier use and applications of coresets.
Author Information
Alaa Maalouf (MIT)
Morad Tukan (DataHeroes)
Vladimir Braverman (Johns Hopkins University)
Daniela Rus (MIT CSAIL)
More from the Same Authors
-
2021 : Adversarial Robustness of Streaming Algorithms through Importance Sampling »
Vladimir Braverman · Avinatan Hasidim · Yossi Matias · Mariano Schain · Sandeep Silwal · Samson Zhou -
2021 : Bi-directional Adaptive Communication for Heterogenous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman -
2021 : Gap-Dependent Unsupervised Exploration for Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 : Is Bang-Bang Control All You Need? »
Tim Seyde · Igor Gilitschenski · Wilko Schwarting · Bartolomeo Stellato · Martin Riedmiller · Markus Wulfmeier · Daniela Rus -
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 : Adversarial Training in Continuous-Time Models and Irregularly Sampled Time-Series »
Alvin Li · Mathias Lechner · Alexander Amini · Daniela Rus -
2023 : Risk-Aware Image Generation by Estimating and Propagating Uncertainty »
Alejandro Perez · Iaroslav Elistratov · Fynn Schmitt-Ulms · Ege Demir · Sadhana Lolla · Elaheh Ahmadi · Daniela Rus · Alexander Amini -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Provable Data Subset Selection For Efficient Neural Networks Training »
Morad Tukan · Samson Zhou · Alaa Maalouf · Daniela Rus · Vladimir Braverman · Dan Feldman -
2023 Poster: On the Forward Invariance of Neural ODEs »
Wei Xiao · Johnson Tsun-Hsuan Wang · Ramin Hasani · Mathias Lechner · Yutong Ban · Chuang Gan · Daniela Rus -
2023 Poster: Dataset Distillation with Convexified Implicit Gradients »
Noel Loo · Ramin Hasani · Mathias Lechner · Daniela Rus -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2021 : Invited Talk 2: Addressing Model Bias and Uncertainty via Evidential Deep Learning »
Daniela Rus -
2021 Poster: The Logical Options Framework »
Brandon Araki · Xiao Li · Kiran Vodrahalli · Jonathan DeCastro · Micah Fry · Daniela Rus -
2021 Poster: On-Off Center-Surround Receptive Fields for Accurate and Robust Image Classification »
Zahra Babaiee · Ramin Hasani · Mathias Lechner · Daniela Rus · Radu Grosu -
2021 Spotlight: On-Off Center-Surround Receptive Fields for Accurate and Robust Image Classification »
Zahra Babaiee · Ramin Hasani · Mathias Lechner · Daniela Rus · Radu Grosu -
2021 Oral: The Logical Options Framework »
Brandon Araki · Xiao Li · Kiran Vodrahalli · Jonathan DeCastro · Micah Fry · Daniela Rus -
2020 Poster: A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits »
Ramin Hasani · Mathias Lechner · Alexander Amini · Daniela Rus · Radu Grosu -
2020 Poster: Coresets for Clustering in Graphs of Bounded Treewidth »
Daniel Baker · Vladimir Braverman · Lingxiao Huang · Shaofeng H.-C. Jiang · Robert Krauthgamer · Xuan Wu -
2020 Poster: Sets Clustering »
Ibrahim Jubran · Morad Tukan · Alaa Maalouf · Dan Feldman -
2020 Poster: Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control »
Jie Xu · Yunsheng Tian · Pingchuan Ma · Daniela Rus · Shinjiro Sueda · Wojciech Matusik -
2020 Poster: Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension »
Vladimir Braverman · Robert Krauthgamer · Aditya Krishnan · Roi Sinoff -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2020 Poster: On the Noisy Gradient Descent that Generalizes as SGD »
Jingfeng Wu · Wenqing Hu · Haoyi Xiong · Jun Huan · Vladimir Braverman · Zhanxing Zhu -
2020 Poster: FetchSGD: Communication-Efficient Federated Learning with Sketching »
Daniel Rothchild · Ashwinee Panda · Enayat Ullah · Nikita Ivkin · Ion Stoica · Vladimir Braverman · Joseph E Gonzalez · Raman Arora -
2019 Poster: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Oral: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2018 Poster: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Oral: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2017 Poster: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Talk: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Poster: Coresets for Vector Summarization with Applications to Network Graphs »
Dan Feldman · Sedat Ozer · Daniela Rus -
2017 Talk: Coresets for Vector Summarization with Applications to Network Graphs »
Dan Feldman · Sedat Ozer · Daniela Rus