Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning
Abstract
Vector Quantization (VQ) has recently emerged as a promising approach for learning discrete representations of graph-structured data. However, a fundamental challenge, i.e., codebook collapse, remains underexplored in the graph domain, significantly limiting the expressiveness and generalization of graph tokens. In this paper, we present the first empirical study and observe that codebook collapse consistently occurs when applying VQ to graph data, even with mitigation strategies proposed in vision or language domains. Moreover, we provide a diagnosis of collapse from data and optimization perspectives, showing that collapse is associated with graph data properties such as feature redundancy and connectivity density, and is further reinforced by the training dynamics of deterministic hard assignment. To address these issues, we propose RGVQ, a novel framework that integrates graph topology and feature similarity as explicit regularization signals to enhance codebook utilization and promote token diversity. RGVQ introduces soft assignments via Gumbel-Softmax reparameterization, ensuring that all codewords receive gradient updates. In addition, RGVQ incorporates a structure-aware contrastive regularization to penalize assigning the same token to dissimilar node pairs. Extensive experiments demonstrate that RGVQ substantially improves codebook utilization and consistently boosts the performance of state-of-the-art graph VQ backbones across multiple downstream tasks, enabling more expressive and transferable graph token representations.