Workshop: Subset Selection in Machine Learning: From Theory to Applications

Online and Non Parametric Coresets for Bregman Divergence

Jayesh Choudhari · Anirban Dasgupta · Rachit Chhaya · Supratim Shit

Keywords: [ Stochastic Optimization ]

[ Abstract ]
Sat 24 Jul noon PDT — 12:04 p.m. PDT

Abstract: We present algorithms that create coresets for clustering problems according to a wide subset of Bregman divergences. Our coresets have a small additive error, similar in magnitude to the lightweight coresets~\cite{bachem2018scalable}, and take an update time $O(d)$. We first give an algorithm that returns {\em online} coresets of size $\tilde{O}((dk\log k)/\epsilon^2\mu^2)$ for $k$-clustering according to any $\mu$-similar Bregman divergence. We also show the existence of {\em non-parametric} coresets, i.e. the coreset size is independent of the number of clusters $k$, as well the dimension of points $d$, for the same subclass of Bregman divergences.