Workshop: Subset Selection in Machine Learning: From Theory to Applications

Online and Non Parametric Coresets for Bregman Divergence

Supratim Shit · Rachit Chhaya · Anirban Dasgupta · Jayesh Choudhari

Keywords: [ Stochastic Optimization ]

Abstract: We present algorithms that create coresets for clustering problems according to a wide subset of Bregman divergences. Our coresets have a small additive error, similar in magnitude to the lightweight coresets~\cite{bachem2018scalable}, and take an update time $O(d)$. We first give an algorithm that returns {\em online} coresets of size $\tilde{O}((dk\log k)/\epsilon^2\mu^2)$ for $k$-clustering according to any $\mu$-similar Bregman divergence. We also show the existence of {\em non-parametric} coresets, i.e. the coreset size is independent of the number of clusters $k$, as well the dimension of points $d$, for the same subclass of Bregman divergences.