Large Scale Manifold Balanced Clustering
Abstract
Manifold clustering has demonstrated strong capability in capturing complex data structures and has been widely studied in cluster analysis. However, many existing methods mainly focus on combining K-means with manifold learning, while overlooking the consistency between data structures and clustering labels, and often suffer from high computational cost when handling large scale data. To address these issues, we propose a manifold balanced clustering method based on anchor induced distance(LMBC), grounded in the relationship between K-means clustering and manifold learning. Specifically, the LMBC uses label information to guide the construction of the manifold structure, thereby ensuring consistency between data structures and clustering labels. To enable large scale clustering, we introduce an anchor induced distance representation that models manifold structure in a compact anchor space, significantly reducing computational complexity while preserving essential structural information. Furthermore, to naturally maintain class balance during clustering, we maximize the Schatten-p norm of the label representation and provide theoretical analysis to support its effectiveness. Experimental results on several benchmark datasets demonstrate the effectiveness and scalability of the proposed method.