Efficient Transformer Attention for SNNs via Hadamard Simplification
Tingting Jiang ⋅ Jiangrong Shen ⋅ Long Chen ⋅ Yaxin Li ⋅ Qi Xu
Abstract
Spiking Neural Networks (SNNs) offer low-power, brain-inspired computation, but Transformer-based SNNs face deployment challenges on neuromorphic hardware due to complex operations and high communication overhead. We propose hardware-efficient attention mechanisms, \textbf{Simplified Spiking Attention (SSA)} and \textbf{Ultra-Simplified Spiking Attention (USSA)}, which replace matrix multiplications with Hadamard products and remove multi-head attention, scaling, and patching. We theoretically show that double masking is redundant and early-spiking gating preserves richer temporal information than late-spiking. On the CIFAR‑10, CIFAR‑100, and DVS‑Gesture datasets, SSA achieves accuracies of 96.38\%, 79.45\%, and 97.56\%, respectively, outperforming baseline Transformer‑SNNs by up to +1.73\%, while reducing computational complexity from $\mathcal{O}(N^2D)$ to $\mathcal{O}(ND)$ and communication complexity from $\mathcal{O}(N^2)$ to $\mathcal{O}(ND)$. USSA further compresses communication complexity to $\mathcal{O}(N)$ with only marginal accuracy loss.
Successful Page Load