Skip to yearly menu bar Skip to main content


Oral
in
Workshop: 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)

An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data

Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar

Keywords: [ synthetic data ] [ Privacy ] [ clinical trials ] [ Machine Learning ] [ data augmentation ]


Abstract:

Synthetic clinical trial data are increasingly being seen as a viable option for research applications when primary data are unavailable. A challenge when applying generative modeling approaches for this purpose is many clinical trial datasets have small sample sizes. In this paper, we present an interpretable data augmentation framework for improving generative models used to produce synthetic clinical trial data. We apply this framework to three clinical trial datasets spanning different disease indications and evaluate the impact of factors such as initial dataset size, generative algorithm, and augmentation scale on metrics used to assess synthetic clinical trial data quality, including fidelity, utility, and privacy. The results indicate that this framework can considerably improve the quality of synthetic data produced using generative algorithms when considering factors of high interest to end users of synthetic clinical trial data.

Chat is not available.