Causal Feature Learning via Generalized Rayleigh Quotients
Abstract
Extracting causally meaningful features from time-series data is fundamental for robust machine learning under distribution shifts. In process monitoring, existing methods struggle to maintain detection performance when operating conditions change. Current approaches capture either temporal causal relationships or cross-environment invariance, but not both simultaneously. We propose Causal Feature Learning (CFL), a unified framework that jointly optimizes for temporal relevance and environment mean invariance. CFL formulates feature extraction as a generalized Rayleigh-quotient problem, maximizing correlation with target variables while penalizing sensitivity to environment-dependent mean shifts. Theoretical analysis establishes conditions under which CFL identifies a mean-invariant predictive subspace. Experiments on the Tennessee Eastman Process demonstrate that CFL achieves 93.69\% average fault detection rate, outperforming 15 baseline methods and validating the benefit of jointly capturing both aspects of causality.