Resilient Coresets and Consistent Clustering
Ashkan Norouzi-Fard ⋅ Silvio Lattanzi ⋅ MohammadHossein Bateni ⋅ Morteza Monemizadeh
Abstract
Many machine learning problems are geometric at their core, relying on metric representations of data for tasks such as clustering, prototype selection, nearest-neighbor search, and graph-based learning. Furthermore, data is constantly evolving and it is routinely transformed through dimensionality reduction, random projections, feature embeddings, compression, or privacy-preserving mechanisms. These transformations are designed to preserve geometry approximately. As a result, they preserve objective values for many geometric optimization problems, but they fail to guarantee that algorithmic outcomes remain consistent. In this work, we study \emph{resilient data summaries} for geometric optimization. Building on the notion of \emph{$\gamma$-resilient algorithms} from Ahmadian, we introduce $\gamma$-resilient coresets. A $\gamma$-resilient $(k,\varepsilon)$-coreset is a compact, weighted summary that guarantees a $(1+\varepsilon)$ approximation to the objective and enforces stability at the level of assignments. We complement our positive result with a lower bound showing that to obtain a tight approximation for resilient clustering it is necessary to use a bi-criteria solution.
Successful Page Load