Poster
in
Workshop: AI for Science: Scaling in AI for Scientific Discovery
Synthetic Data-driven Prediction of Height for Childhood Malnutrition
David Berthiaume · Yuan Tang · Chau Nguyen · Siyu Gai · Maria E. Mazzolenis · Weiwei Pan
Keywords: [ Computer Vision ] [ Ethical AI ] [ Height Prediction ] [ Synthetic Data Generation ] [ transfer learning ]
While computer vision approaches have demonstrated success in various image-based tasks, they face challenges with early childhood height prediction for malnutrition detection due to a scarcity of publicly available training data. However, building public datasets for training and benchmarking machine learning models for this task is difficult because of the sensitive nature of the images.Although synthetic data have been employed in other data-scarce machine learning tasks, they do not exist for predicting children's height.In this work, we develop a novel generative algorithm to create synthetic images (including depth maps, segmentation maps, and key points) with non-photorealistic human figures, thereby providing an ethical and scalable solution to pre-train and evaluate computer vision models in a controlled setting. Our synthetic dataset models a wide variety of key real-world variables such as physical proportions, lighting, and posture.We demonstrate the potential of our dataset in a transfer learning setting by showing that models pre-trained on our synthetic data outperform baseline approaches when applied to real-world prediction tasks.