Streaming Covariate Balancing via Discrepancy-Based Feature Coresets
Abstract
Real-time estimation of average treatment effects (ATE) in streaming observational data poses two key challenges: strict memory constraints that preclude storing the full data history, and distributional shifts in both treatment assignment and outcome-generating process. Existing methods either require offline access to the entire dataset for covariate balancing or rely on parametric online models that are vulnerable to model misspecification under such shifts. This paper proposes a novel model-agnostic method for ATE estimation in streaming data, which effectively addresses the above challenges. Based on discrepancy theory, we first compress streaming data into feature coresets that preserve covariate balancing objectives over a rich nonparametric function class, enabling linear-time updates with bounded memory. Then, by directly learning balancing weights and bypassing parametric propensity score estimation, we enhance the model's robustness against the shift in treatment assignment, while by balancing over an expressive function space we make the model more adaptive to the shift in the outcome-generating process. Theoretically, we establish convergence guarantees with explicit bounds on memory usage and computational complexity. Empirically, extensive experiments on both synthetic and real-world datasets show the effectiveness and robustness of the proposed method, consistently outperforming existing techniques.