Quaternion Self-Attention with Shared Scores
Abstract
Quaternion Neural Networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases computational cost and allows attention distributions to diverge across components. We propose a Shared-Score Quaternion Self-Attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75\% and the number of softmax operations from four to one. We prove that the component-wise and shared scores lie in the same interaction subspace—the linear span of bilinear terms induced by quaternion linear projections. This indicates that independent component-wise attention primarily re-parameterizes the same interactions rather than fundamentally expanding the feature interaction space. In speech enhancement, where phase information is crucial, our method reduces the inference time by 45--61\% while maintaining enhancement quality, making quaternion attention a more practical approach. These findings provide a systematic approach to efficient hypercomplex attention.