Beyond Point Predictions: Manifold Expansion and Dual Alignment for Robust Time Series Distillation
Abstract
Knowledge Distillation (KD) promises to bridge the gap between the high computational costs of Transformer-based models and the expressiveness limitations of linear models in long-term time series forecasting. Existing time series distillation methods inherit the computer vision paradigm, constraining student models by minimizing point-wise prediction matching (output-level distillation) errors. However, blindly mimicking teacher predictions, which are often uncertain, can induce negative transfer. To address this, we propose Dynamic Structural Distillation (DSD), a robust framework that goes beyond the prediction-matching paradigm. First, we design LMP-Net, leveraging manifold expansion to project features into a high-dimensional latent space, alleviating the expressiveness bottleneck while preserving lightweight inference. Second, to address architectural mismatch, we propose Dual Manifold Alignment, employing Similarity-Preserving Knowledge Distillation (SPKD) and Optimal Transport (OT) to align features at the topological and geometric levels, respectively. Finally, we introduce Regime-Aware Adaptive Distillation (RAAD) to mitigate teacher misguidance via a dataset-level regime prior and a confidence-based adaptive gating mechanism. Extensive experiments on five benchmarks validate that DSD is compatible with diverse Transformer-based teachers, mitigating negative transfer while achieving a favorable accuracy--efficiency trade-off. An anonymized implementation is available at https://anonymous.4open.science/r/DSD-master-4B8F.