Track: Causal Inference 2

Fri 13 July 8:00 - 8:20 PDT

Orthogonal Machine Learning: Power and Limitations

Ilias Zadik · Lester Mackey · Vasilis Syrgkanis

Double machine learning provides n^{1/2}-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an n^{-1/4} rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the n^{-1/4} requirement can be improved to n^{-1/(2k+2)} by employing a k-th order notion of orthogonality that grants robustness to more complex or higher-dimensional nuisance parameters. In the partially linear regression setting popular in causal inference, we show that we can construct second-order orthogonal moments if and only if the treatment residual is not normally distributed. Our proof relies on Stein's lemma and may be of independent interest. We conclude by demonstrating the robustness benefits of an explicit doubly-orthogonal estimation procedure for treatment effect.

Fri 13 July 8:20 - 8:40 PDT

Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models

Raj Agrawal · Caroline Uhler · Tamara Broderick

Learning a Bayesian network (BN) from data can be useful for decision-making or discovering causal relationships. However, traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points. The resulting uncertainty about the underlying network as well as the desire to incorporate prior information recommend a Bayesian approach to learning the BN, but the highly combinatorial structure of BNs poses a striking challenge for inference. The current state-of-the-art methods such as order MCMC are faster than previous methods but prevent the use of many natural structural priors and still have running time exponential in the maximum indegree of the true directed acyclic graph (DAG) of the BN. We here propose an alternative posterior approximation based on the observation that, if we incorporate empirical conditional independence tests, we can focus on a high-probability DAG associated with each order of the vertices. We show that our method allows the desired flexibility in prior specification, removes timing dependence on the maximum indegree, and yields provably good posterior approximations; in addition, we show that it achieves superior accuracy, scalability, and sampler mixing on several datasets.

Fri 13 July 8:40 - 8:50 PDT

Accurate Inference for Adaptive Linear Models

Yash Deshpande · Lester Mackey · Vasilis Syrgkanis · Matt Taddy

Estimators computed from adaptively collected data do not behave like their non-adaptive brethren.Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit.We develop a general method -- \emph{$\vect{W}$-decorrelation} -- for transforming the bias of adaptive linear regression estimators into variance.The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy.We bound the finite-sample bias and variance of the $\vect{W}$-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic $\vect{W}$-decorrelation procedure in two different adaptive data settings: the multi-armed bandit and the autoregressive time series.

Fri 13 July 8:50 - 9:00 PDT

Detecting non-causal artifacts in multivariate linear regression models

Dominik Janzing · Bernhard Schölkopf

We consider linear models where d potential causes X1,...,Xd are correlated with one target quantity Y and propose a method to infer whether the association is causal or whether it is an artifact caused by overfitting or hidden common causes.We employ the idea that in the former case the vector of regression coefficients has `generic' orientation relative to the covariance matrix Sigma{XX} of X. Using an ICA based model for confounding, we show that both confounding and overfitting yield regression vectors that concentrate mainly in the space of low eigenvalues of Sigma{XX}.

Main Navigation

Session

Causal Inference 2

Orthogonal Machine Learning: Power and Limitations

Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models

Accurate Inference for Adaptive Linear Models

Detecting non-causal artifacts in multivariate linear regression models