Poster
 in 
Workshop: 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)
                        
                    
                    An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data
Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar
Keywords: [ data augmentation ] [ Machine Learning ] [ clinical trials ] [ Privacy ] [ synthetic data ]
Synthetic clinical trial data are increasingly being seen as a viable option for research applications when primary data are unavailable. A challenge when applying generative modeling approaches for this purpose is many clinical trial datasets have small sample sizes. In this paper, we present an interpretable data augmentation framework for improving generative models used to produce synthetic clinical trial data. We apply this framework to three clinical trial datasets spanning different disease indications and evaluate the impact of factors such as initial dataset size, generative algorithm, and augmentation scale on metrics used to assess synthetic clinical trial data quality, including fidelity, utility, and privacy. The results indicate that this framework can considerably improve the quality of synthetic data produced using generative algorithms when considering factors of high interest to end users of synthetic clinical trial data.