Poster
in
Workshop: DMLR Workshop: Data-centric Machine Learning Research
Ensemble Fractional Imputation for Incomplete Categorical Data with a Graphical Model
Yonghyun Kwon · Jae-kwang Kim
Missing data is common in practice, and standard statistical inference can be biased when missingness is related to the outcome of interest. We present a frequentist approach using a graphical model and fractional imputation, which can handle missing data for multivariate categorical variables under missing at random assumption. To avoid the problem due to the curse of dimensionality in multivariate data, we adopt the idea of a random forest to fit multiple reduced models and then combine multiple models using model weights. The model weights are computed from the novel method, double projection, where the observed likelihood is projected to the class of a graphical mixture model. The performance of the proposed method is investigated through an extensive simulation study.