Timezone: »

 
An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data
Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar
Event URL: https://openreview.net/forum?id=BZh0Eb2y2I »

Synthetic clinical trial data are increasingly being seen as a viable option for research applications when primary data are unavailable. A challenge when applying generative modeling approaches for this purpose is many clinical trial datasets have small sample sizes. In this paper, we present an interpretable data augmentation framework for improving generative models used to produce synthetic clinical trial data. We apply this framework to three clinical trial datasets spanning different disease indications and evaluate the impact of factors such as initial dataset size, generative algorithm, and augmentation scale on metrics used to assess synthetic clinical trial data quality, including fidelity, utility, and privacy. The results indicate that this framework can considerably improve the quality of synthetic data produced using generative algorithms when considering factors of high interest to end users of synthetic clinical trial data.

Author Information

Afrah Shafquat
Mandis Beigi (Columbia University)
Chufan Gao (University of Illinois Urbana-Champaign)
Jason Mezey (Cornell University)
Jimeng Sun (University of Illinois at Urbana - Champaign)
Jacob Aptekar (Medidata)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors