Abstract:
Resulting from non-random sample selection caused by both the treatment and outcome, collider bias poses a unique challenge to treatment effect estimation using observational data whose distribution differs from that of the target population. In this paper, we rethink collider bias from an out-of-distribution (OOD) perspective, considering that the entire data space of the target population consists of two different environments: The observational data selected from the target population belongs to a seen environment labeled with S=1S=1 and the missing unselected data belongs to another unseen environment labeled with S=0S=0. Based on this OOD formulation, we utilize small-scale representative data from the entire data space with no environmental labels and propose a novel method, i.e., Coupled Counterfactual Generative Adversarial Model (C22GAM), to simultaneously generate the missing S=0S=0 samples in observational data and the missing SS labels in the small-scale representative data. With the help of C22GAM, collider bias can be addressed by combining the generated S=0S=0 samples and the observational data to estimate treatment effects. Extensive experiments on synthetic and real-world data demonstrate that plugging C22GAM into existing treatment effect estimators achieves significant performance improvements.
Chat is not available.