Dynamic Relational Priming Improves Transformer in Multivariate Time Series
Abstract
Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations in each layer. This limits their representational alignment with the potentially diverse dynamics of each token-pair interaction. While they excel in domains with relatively homogeneous relationships, standard attention may be inadequate in capturing heterogeneous inter-channel dependencies of multivariate time series (MTS) data where different channel-pair interactions within a single system may be governed by entirely different physical laws or temporal dynamics. To better align the attention mechanism for such domain phenomena, we propose attention with dynamic relational priming (prime attention). Prime attention modulates token representations for each token-pair, optimizing each pair-wise interaction for that specific relationship. Our results demonstrate that prime attention consistently outperforms standard attention across benchmarks, achieving up to 6.5\% improvement in forecasting accuracy. In addition, prime attention achieves comparable performance using up to 40\% less sequence length compared to standard attention, demonstrating its superior relational modeling capabilities and potential for data efficiency.