Synthetic Healthcare Data Generation and Assessment: Challenges, Methods, and Impact on Machine Learning
Ahmed M. Alaa · Mihaela van der Schaar

In this tutorial we provide an overview of state-of-the-art techniques for synthesizing the two most common types of clinical data; namely tabular (or multidimensional) data and time-series data. In particular we discuss various generative modeling approaches based on generative adversarial networks (GANs) normalizing flows and state-space models for cross-sectional and time-series data demonstrating the use cases of such models in creating synthetic training data for machine learning algorithms and highlighting the comparative strengths and weaknesses of these different approaches. In addition we discuss the issue of evaluating the quality of synthetic data and the performance of generative models; we highlight the challenges associated with evaluating generative models as compared to discriminative predictions and present various metrics that can be used to quantify different aspects of synthetic data quality.

Ahmed M. Alaa (UCLA)
Mihaela van der Schaar (University of Cambridge and UCLA)
Professor van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge, a Turing Faculty Fellow at The Alan Turing Institute in London, and Chancellor's Professor at UCLA. She was elected IEEE Fellow in 2009. She has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), an NSF Career Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. She holds 35 granted USA patents. In 2019, she was identified by National Endowment for Science, Technology and the Arts as the female researcher based in the UK with the most publications in the field of AI. She was also elected as a 2019 "Star in Computer Networking and Communications". Her current research focus is on machine learning, AI and operations research for healthcare and medicine. For more details, see her website: http://www.vanderschaar-lab.com/

