Timezone: »
Many machine learning problems involve processing large datasets, thus making them expensive computational tasks. One mitigation strategy selects representative subsets of the data and one way to achieve this is via coresets~ a weighted subset of the given data, such that the cost function under consideration can be additively and/or multiplicatively approximated. While much previous work chooses these subsets using independent importancebased sampling, such methods can be suboptimal when the system possesses redundancy. Recently, Tremblay et al. 2019 considered sampling from fixedsized determinantal point process (DPP) and gave lower bounds on the coreset sample complexity. While DPPs often empirically outperform importancebased sampling methods, their coreset size bound is worse compared to independent sampling. In this work, we extend their theoretical results by improving the lower bound on sample complexity for coreset approximation by a log(1 / epsilon) factor using chaining, a technique in empirical process literature. This, therefore, moves a step further in bridging the gap between theoretical justification and empirical observations. We provide additive approximation guarantees for the associated cost function and allow the possibility of negative weights, which has a direct application in layerwise pruning of deep neural networks. On top of additive approximation guarantees, we also extend our bounds to the scenarios with multiplicative approximations.
Author Information
Jeff Bilmes (UW)
Gantavya Bhatt (University of Washington)
More from the Same Authors

2021 : Tighter mDPP Coreset Sample Complexity Bounds »
Gantavya Bhatt · Jeff Bilmes 
2021 : More Information, Less Data »
Jeff Bilmes · Jeff Bilmes 
2021 : Introduction by the Organizers »
Abir De · Rishabh Iyer · Ganesh Ramakrishnan · Jeff Bilmes 
2021 Workshop: Subset Selection in Machine Learning: From Theory to Applications »
Rishabh Iyer · Abir De · Ganesh Ramakrishnan · Jeff Bilmes 
2020 Poster: Coresets for Dataefficient Training of Machine Learning Models »
Baharan Mirzasoleiman · Jeff Bilmes · Jure Leskovec 
2020 Poster: TimeConsistent SelfSupervision for SemiSupervised Learning »
Tianyi Zhou · Shengjie Wang · Jeff Bilmes 
2019 : Jeff Bilmes: Deep Submodular Synergies »
Jeff Bilmes 
2019 Poster: Bias Also Matters: Bias Attribution for Deep Neural Network Explanation »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes 
2019 Oral: Bias Also Matters: Bias Attribution for Deep Neural Network Explanation »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes 
2019 Poster: Jumpout : Improved Dropout for Deep Neural Networks with ReLUs »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes 
2019 Poster: Combating Label Noise in Deep Learning using Abstention »
Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal MohdYusof 
2019 Oral: Jumpout : Improved Dropout for Deep Neural Networks with ReLUs »
Shengjie Wang · Tianyi Zhou · Jeff Bilmes 
2019 Oral: Combating Label Noise in Deep Learning using Abstention »
Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal MohdYusof 
2018 Poster: Constrained Interacting Submodular Groupings »
Andrew Cotter · Mahdi Milani Fard · Seungil You · Maya Gupta · Jeff Bilmes 
2018 Poster: Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions »
Wenruo Bai · Jeff Bilmes 
2018 Oral: Constrained Interacting Submodular Groupings »
Andrew Cotter · Mahdi Milani Fard · Seungil You · Maya Gupta · Jeff Bilmes 
2018 Oral: Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions »
Wenruo Bai · Jeff Bilmes