Learning Models from Data with Measurement Error: Tackling Underreporting
Roy Adams · Yuelong Ji · Xiaobin Wang · Suchi Saria

Wed Jun 12th 02:35 -- 02:40 PM @ Grand Ballroom

Measurement error in observational datasets can lead to systematic bias in inferences based on these datasets. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases. In this paper we present a method for estimating the distribution of an outcome given a binary exposure that is subject to underreporting. Our method is based on a missing data view of the measurement error problem, where the true exposure is treated as a latent variable that is marginalized out of a joint model. We prove three different conditions under which the outcome distribution can still be identified from data containing only an error-prone observations of the exposure. We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identifiability conditions. Finally, we use this method to estimate the effects of maternal smoking and opioid use during pregnancy on childhood obesity, two import problems from public health. Using the proposed method, we estimate these effects using only subject-reported drug use data and substantially refine the range of estimates generated by a sensitivity analysis-based approach. Further, the estimates produced by our method are consistent with existing literature on both the effects of maternal smoking and the rate at which subjects underreport smoking.

Author Information

Roy Adams (Johns Hopkins University)
Yuelong Ji (Johns Hopkins University)
Xiaobin Wang (Johns Hopkins University)
Suchi Saria (Johns Hopkins University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors