Time-Series Decomposition as a standalone Task: A Mechanism-Driven Diagnostic Benchmark
Abstract
We benchmark time series decomposition as a standalone evaluation task. While decomposition outputs are widely used to interpret trend and periodic structure, their quality is often assessed informally, and no unified benchmark exists for comparing component recovery under controlled generative mechanisms. We introduce a synthetic evaluation suite with explicit trend and cycle taxonomies, a unified interface covering representative decomposition families, and complementary metrics capturing distinct error modes (shape, phase, and spectral fidelity). Across stationary periodic regimes, STL-family methods are near-ceiling; under non-stationary periodicity (frequency drift, regime switching), fixed-period priors induce phase degradation, while subspace/time-frequency methods better preserve seasonal consistency (adaptive spectral methods may require tuning). We further extend the benchmark with a downstream scientific-discovery track---symbolic regression on decomposed components---showing that a decompose-then-regress pipeline materially improves recoverability and reduces expression complexity, linking decomposition quality to structure discovery. We release a pip-installable package and a lightweight web interface to make the benchmark and results easily accessible.