Beyond Looking Up, Try Looking Around: Harmonizing Global Structure and Local Consistency in Optimal Transport for Short Text Clustering
Abstract
Pseudo-labeling based on Optimal Transport (OT) has become an effective mechanism for enhancing short text clustering. Existing OT methods are short in modeling semantic consistencies between samples, which may assign different pseudo-labels to semantically similar samples. These erroneous pseudo-labels can cause the model to produce inferior clusters. This paper proposes a novel short text clustering framework, which remedies the neglect of semantic consistency in existing OT methods, generating reliable pseudo-labels to facilitate clustering. Specifically, our method first proposes a novel instance-level attention mechanism to capture semantic relationships between samples, which are then integrated into the OT formulation to endow the transport process with neighborhood semantic awareness. By solving the proposed OT formulation, reliable pseudo-labels are obtained that simultaneously account for sample-to-sample semantic consistency and sample-to-cluster global structure information. These reliable pseudo-labels are then used as supervisory signals to guide the model to achieve accurate clustering. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art approaches. The code is available at: https://anonymous.4open.science/r/RPDC-STC-8B53/README.md