Skip to yearly menu bar Skip to main content

Workshop: The Neglected Assumptions In Causal Inference

Leveraging molecular negative controls for effect estimation in non-randomized human health and disease studies: a demonstrative simulation study

Jon Huang


Background Exploratory null-hypothesis significance testing (e.g. GWAS, EWAS) form the backbone of molecular mechanism discovery, however methods to identify true causal signals are underdeveloped. We evaluate two negative control approaches to quantitatively control for shared unmeasured confounding and recover unbiased effects using epigenomic data and biologically-informed structural assumptions.

Methods We consider the application of the control outcome calibration approach (COCA) and proximal g-computation (PGC) to case studies in reproductive genomics. COCA may be employed when maternal epigenome has no direct effects on phenotype and proxy shared unmeasured confounders and PG further with suitable genetic instruments (e.g. mQTLs). Baseline covariates were extracted from 777 mother-child pairs in a birth cohort with maternal blood and fetal cord DNA methylation array data. Treatment, negative control, and outcome values were simulated in 2000 bootstraps under a plasmode simulation framework. Bootstrapped, ordinary (COCA) and 2-stage (PGC) least squares were fitted to estimate treatment effects and standard errors under various settings of missing confounders (e.g. paternal data). Regression adjustment and a naive application of doubly-robust, ensemble learning efficient estimators were compared.

Results COCA and PGC performed well in simplistic data generating processes. However, in real-world cohort simulations, COCA performed acceptably only in settings with strong proxy confounders, but otherwise poorly (median bias 610%; coverage 29%). PGC performed slightly better. Alternatively, simple covariate adjustments generally outperformed all others in bias and confidence interval coverage across scenarios (median bias 22%; 71% coverage).

Discussion Molecular epidemiology provides key opportunity to leverage biological knowledge against unmeasured confounding, but these identification strategies are underutilized and understudied in this context. Negative control calibration or adjustments may help under limited scenarios where assumptions are fulfilled, but should be tested with simulations closer to real-world conditions.

Chat is not available.