Gradient-Based Causal Tree Ensembles: A Backbone Architecture for Heterogeneous Treatment Effects
Abstract
Estimating Heterogeneous Treatment Effects (HTE) from observational data is essential in fields such as healthcare and policy-making, where randomized experiments are often impractical. While representation learning-based methods have shown promise, recent studies suggest that tree-based approaches may offer superior performance on tabular data, particularly in the presence of uninformative features. We introduce GRAdient-based Causal tree Ensembles (GRACE), a novel tree-based architecture for HTE estimation that incorporates multi-way, oblique, and soft splits, enabling end-to-end training via backpropagation. GRACE can be seamlessly integrated into existing models as a replacement for fully-connected neural network layers. Across diverse benchmarks involving binary and non-binary treatment settings, GRACE consistently surpasses neural network and tree-based baselines, often by a substantial margin. We further analyze GRACE as an extension of fully-connected neural network layers and conduct ablation studies to isolate and quantify the contribution of each architectural component to the improvement in performance. These results position GRACE as a powerful new foundation for flexible, robust, and accurate HTE estimation.