Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)
Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates
Afrin Dange · Raj · Praneeth Kumar Netrapalli · Sunita Sarawagi
Recent foundation models (FMs) for time series forecasting (TSF) have shown promising results in zero-shot generalization to new series. However, when time series are associated with input covariates, these models are incapable of modeling series-specific dependence of the forecasted values on the covariates.We identify that historical values in TSF implicitly provide labeled data, which can be leveraged for in-context learning (ICL). While transformers have demonstrated ICL capabilities for regression tasks, when harnessing them as FMs we need to analyze the impact of what constitutes a token in the transformer, the type of attention, and the placement of loss functions during pre-training. We study three existing tokenization schemes for regression tasks in terms of their training convergence and ICL capacity. We propose a modified shifted causal attention designed for faster convergence during pre-training since it allows imposition of next-token loss at multiple positions. Further, it combines the covariates and target such that ICL is achievable for linear regression in just one layer. For time-series data, a popular tokenization method in existing FMs is patching the input series. Our theoretical analysis shows that such tokenization is suboptimal for ICL on time series with covariates.