Skip to yearly menu bar Skip to main content


Spotlight

Sharper Generalization Bounds for Clustering

Shaojie Li · Yong Liu

Abstract: Existing generalization analysis of clustering mainly focuses on specific instantiations, such as (kernel) k-means, and a unified framework for studying clustering performance is still lacking. Besides, the existing excess clustering risk bounds are mostly of order O(K/n) provided that the underlying distribution has bounded support, where n is the sample size and K is the cluster numbers, or of order O(K2/n) under strong assumptions on the underlying distribution, where these assumptions are hard to be verified in general. In this paper, we propose a unified clustering learning framework and investigate its excess risk bounds, obtaining state-of-the-art upper bounds under mild assumptions. Specifically, we derive sharper bounds of order O(K2/n) under mild assumptions on the covering number of the hypothesis spaces, where these assumptions are easy to be verified. Moreover, for the hard clustering scheme, such as (kernel) k-means, if just assume the hypothesis functions to be bounded, we improve the upper bounds from the order O(K/n) to O(K/n). Furthermore, state-of-the-art bounds of faster order O(K/n) are obtained with the covering number assumptions.

Chat is not available.