Skip to yearly menu bar Skip to main content


Synthetic Healthcare Data Generation and Assessment: Challenges, Methods, and Impact on Machine Learning

Ahmed M. Alaa · Mihaela van der Schaar


In this tutorial we provide an overview of state-of-the-art techniques for synthesizing the two most common types of clinical data; namely tabular (or multidimensional) data and time-series data. In particular we discuss various generative modeling approaches based on generative adversarial networks (GANs) normalizing flows and state-space models for cross-sectional and time-series data demonstrating the use cases of such models in creating synthetic training data for machine learning algorithms and highlighting the comparative strengths and weaknesses of these different approaches. In addition we discuss the issue of evaluating the quality of synthetic data and the performance of generative models; we highlight the challenges associated with evaluating generative models as compared to discriminative predictions and present various metrics that can be used to quantify different aspects of synthetic data quality.

Chat is not available.