Improving Graph Transformers via Global Structural Priors
Abstract
By synergizing graph topology with the global expressive power of the attention mechanism, Graph Transformers (GTs) have emerged as a dominant architecture for node classification. However, existing models primarily focus on diverse topology injection mechanisms, specifically score-level and representation-level designs, yet lack a unified theoretical foundation to characterize how these mechanisms shape the representation propagation. To bridge this research gap, this paper unifies these designs under a common Graph Signal Denoising framework, revealing that denoising efficacy (\textit{i.e.}, representation quality) is fundamentally dictated by the block-diagonal structure of the propagation operator. To instantiate this prior efficiently, this paper introduces a novel Block-Diagonal GT architecture, named \textsc{BDFormer}, which enforces a block-diagonal constraint via spectral-regularized cross-attention on latent anchors. Specifically, by routing global interactions through these anchors, \textsc{BDFormer} imposes the spectral block-constraint directly on the anchor-level affinity. Crucially, the learned global affinity guides the pruning of local heterophilous edges, ensuring that both scales synergistically adhere to the target distribution. Extensive evaluations on benchmark datasets demonstrate the scalability and robustness of \textsc{BDFormer}.