(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models
Abstract
We introduce \textsc{Mosaic}, a probabilistic weather forecasting model that addresses two sources of spectral degradation in ML-based weather prediction: training to predict the ensemble mean deterministically and compressive encoding creating an information bottleneck. \textsc{Mosaic} combines learned functional perturbations for ensemble forecasting with block-sparse attention, a hardware-aligned formulation that shares keys and values across spatially adjacent queries, enabling each block to dynamically attend to the most relevant regions. By capturing arbitrarily long-range dependencies at linear cost, \textsc{Mosaic} processes high-resolution weather data without compression. On IFS HRES data, \textsc{Mosaic} at 1.5° resolution matches or outperforms models trained on 0.25° data, with individual ensemble members exhibiting near-perfect spectral alignment across all resolved frequencies.