Track: Oral 1C Clustering

Tue 23 July 1:30 - 1:45 PDT

LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering

Li Sun · Zhenhao Huang · Hao Peng · YuJie Wang · Chunyang Liu · Philip Yu

Graph clustering is a fundamental problem in machine learning. Deep learning methods achieve the state-of-the-art results in recent years, but they still cannot work without predefined cluster numbers. Such limitation motivates us to pose a more challenging problem of graph clustering with unknown cluster number. We propose to address this problem from a fresh perspective of graph information theory (i.e., structural information). In the literature, structural information has not yet been introduced to deep clustering, and its classic definition falls short of discrete formulation and modeling node features. In this work, we first formulate a differentiable structural information (DSI) in the continuous realm, accompanied by several theoretical results. By minimizing DSI, we construct the optimal partitioning tree where densely connected nodes in the graph tend to have the same assignment, revealing the cluster struc- ture. DSI is also theoretically presented as a new graph clustering objective, not requiring the pre-defined cluster number. Furthermore, we design a neural LSEnet in the Lorentz model of hyperbolic space, where we integrate node features to structural information via manifold-valued graph convolution. Extensive empirical results on real graphs show the superiority of our approach.

Tue 23 July 1:45 - 2:00 PDT

Image Clustering with External Guidance

Yunfan Li · Peng Hu · Dezhong Peng · Jiancheng Lv · Jianping Fan · Xi Peng

The core of clustering lies in incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from data. Nevertheless, the abundant external knowledge such as semantic descriptions, which naturally conduces to clustering, is regrettably overlooked. In this work, we propose leveraging external knowledge as a new supervision signal to guide clustering. To implement and validate our idea, we design an externally guided clustering method (Text-Aided Clustering, TAC), which leverages the textual semantics of WordNet to facilitate image clustering. Specifically, TAC first selects and retrieves WordNet nouns that best distinguish images to enhance the feature discriminability. Then, TAC collaborates text and image modalities by mutually distilling cross-modal neighborhood information. Experiments demonstrate that TAC achieves state-of-the-art performance on five widely used and three more challenging image clustering benchmarks, including the full ImageNet-1K dataset. The code can be accessed at https://github.com/XLearning-SCU/2024-ICML-TAC.

Tue 23 July 2:00 - 2:15 PDT

Making Old Things New: A Unified Algorithm for Differentially Private Clustering

Max Dupre la Tour · Monika Henzinger · David Saulpic

As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied, under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied for the local and the shuffle variation. In each case, the goal is to design an algorithm that computes privately a clustering, with the smallest possible error. The study of each variation gave rise to new algorithm: the landscape of private clustering algorithm is therefore quite intricate. In this paper, we show that a 20 year-old algorithm can be slightly modified to work for any of those models. This provides a unified picture: while matching almost all previously known results, it allows us to improve some of them, and extend to a new privacy model, the continual observation setting, where the input is changing over time and the algorithm must output a new solution at each time step.

Tue 23 July 2:15 - 2:30 PDT

Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models

Mina Dalirrooyfard · Konstantin Makarychev · Slobodan Mitrovic

Given a graph with positive and negative edge labels, the correlation clustering problem aims to cluster the nodes so to minimize the total number of between-cluster positive and within-cluster negative edges. This problem has many applications in data mining, particularly in unsupervised learning. Inspired by the prevalence of large graphs and constantly changing data in modern applications, we study correlation clustering in dynamic, parallel (MPC), and local computation (LCA) settings. We design an approach that improves state-of-the-art runtime complexities in all these settings. In particular, we provide the first fully dynamic algorithm that runs in an expected amortized constant time, without any dependence on the graph size. Moreover, our algorithm essentially matches the approximation guarantee of the celebrated Pivot algorithm.

Main Navigation

Oral

Oral 1C Clustering

Hall A2

LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering

Image Clustering with External Guidance

Making Old Things New: A Unified Algorithm for Differentially Private Clustering

Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models