Spotlight
in
Workshop: Continuous Time Perspectives in Machine Learning
MQTransformer: Context Dependent Attention and Bregman Volatility
Carson Eisenach · Dhruv Madeka · Kevin Chen · Lee Dicker
In many forecasting applications (e.g. retail demand, electricity load, weather, finance, etc.), the forecasts must obey certain properties such as having certain context-dependent and time-varying seasonality patterns and avoiding excessive revision as new information becomes available. Here we propose a new forecasting neural net architecture that addresses some of these issues, MQ-Transformer, by incorporating three architectural improvements to the current state-of-the-art: 1) a novel decoder-encoder attention that aligns the historical and future time periods 2) a novel positional encoding that learns seasonality from the historical time series and 3) a novel decoder-self attention that allows the network to minimize the forecast volatility. We then define a new measure of forecast volatility, Bregman Volatility, to understand one major source of the improvement from our model. Bregman Volatility allows us to compute the optimal volatility of a sequence of forecasts in terms of the improvement in forecast accuracy over that time period. We show both theoretically and empirically that the decoder-self attention module optimizes Bregman volatility and thereby improves forecast accuracy as well.