Timezone: »
Many machine learning problems involve processing large datasets, thus making them expensive computational tasks. One mitigation strategy selects representative subsets of the data and one way to achieve this is via coresets~- a weighted subset of the given data, such that the cost function under consideration can be additively and/or multiplicatively approximated. While much previous work chooses these subsets using independent importance-based sampling, such methods can be suboptimal when the system possesses redundancy. Recently, Tremblay et al. 2019 considered sampling from fixed-sized determinantal point process (DPP) and gave lower bounds on the coreset sample complexity. While DPPs often empirically outperform importance-based sampling methods, their coreset size bound is worse compared to independent sampling. In this work, we extend their theoretical results by improving the lower bound on sample complexity for coreset approximation by a log(1 / epsilon) factor using chaining, a technique in empirical process literature. This, therefore, moves a step further in bridging the gap between theoretical justification and empirical observations. We provide additive approximation guarantees for the associated cost function and allow the possibility of negative weights, which has a direct application in layer-wise pruning of deep neural networks. On top of additive approximation guarantees, we also extend our bounds to the scenarios with multiplicative approximations.
Author Information
Gantavya Bhatt (University of Washington)
Jeff Bilmes (UW)
More from the Same Authors
-
2023 : Accelerating Batch Active Learning Using Continual Learning Techniques »
Gantavya Bhatt · Arnav M Das · · Rui Yang · Vianne Gao · Jeff Bilmes -
2021 : Tighter m-DPP Coreset Sample Complexity Bounds »
Jeff Bilmes · Gantavya Bhatt -
2021 : More Information, Less Data »
Jeff Bilmes · Jeff Bilmes -
2021 : Introduction by the Organizers »
Abir De · Rishabh Iyer · Ganesh Ramakrishnan · Jeff Bilmes -
2021 Workshop: Subset Selection in Machine Learning: From Theory to Applications »
Rishabh Iyer · Abir De · Ganesh Ramakrishnan · Jeff Bilmes -
2020 Poster: Coresets for Data-efficient Training of Machine Learning Models »
Baharan Mirzasoleiman · Jeff Bilmes · Jure Leskovec -
2020 Poster: Time-Consistent Self-Supervision for Semi-Supervised Learning »
Tianyi Zhou · Shengjie Wang · Jeff Bilmes -
2019 : Jeff Bilmes: Deep Submodular Synergies »
Jeff Bilmes -
2019 Poster: Bias Also Matters: Bias Attribution for Deep Neural Network Explanation »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes -
2019 Oral: Bias Also Matters: Bias Attribution for Deep Neural Network Explanation »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes -
2019 Poster: Jumpout : Improved Dropout for Deep Neural Networks with ReLUs »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes -
2019 Poster: Combating Label Noise in Deep Learning using Abstention »
Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal Mohd-Yusof -
2019 Oral: Jumpout : Improved Dropout for Deep Neural Networks with ReLUs »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes -
2019 Oral: Combating Label Noise in Deep Learning using Abstention »
Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal Mohd-Yusof -
2018 Poster: Constrained Interacting Submodular Groupings »
Andrew Cotter · Mahdi Milani Fard · Seungil You · Maya Gupta · Jeff Bilmes -
2018 Poster: Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions »
Wenruo Bai · Jeff Bilmes -
2018 Oral: Constrained Interacting Submodular Groupings »
Andrew Cotter · Mahdi Milani Fard · Seungil You · Maya Gupta · Jeff Bilmes -
2018 Oral: Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions »
Wenruo Bai · Jeff Bilmes