Learning Treatment Representations for Downstream Instrumental Variable Regression
Abstract
Traditional instrumental variable (IV) estimators cannot accommodate more treatments than instruments, a limitation that is critical for high-dimensional, unstructured data like clinical treatment pathways. Current practice—applying unsupervised dimension reduction before IV estimation—suffers from substantial omitted treatment bias because the representation learning step ignores the instrument. We propose a novel framework that constructs treatment representations by explicitly incorporating instrumental variables. We prove that this instrument-guided approach ensures the identification of optimal outcome-prediction directions even with limited instruments. Validation on large-scale, semi-synthetic clinical data derived from a major hospital, along with other simulations, shows that our approach significantly outperforms conventional two-stage methods.