Oral
in
Workshop: Machine Learning for Multimodal Healthcare Data
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection
Ulas Bagci · Debesh Jha · Vanshali Sharma · Neethi Dasu · Nikhil Tomar · Steven Hicks · Pradip Das · M Bhuyan · Michael Riegler · Pål Halvorsen · Thomas de Lange
Keywords: [ Co-creation and human-in-the-loop ]
Artificial intelligence (AI) systems in different medical applications have gained enormous popularity. However, their limited scalability and acceptance in real-time clinical practices are attributed to several factors, such as biased outcomes, transparency, and under-performance on unseen data. The lack of large-scale, precisely labeled, diverse data is a major reason for these drawbacks. Such datasets are sparsely available due to the legal restrictions and manual efforts required for extensive annotation with medical expertise. In this work, we present GastroVision, an open-access endoscopy dataset with the largest number of different anatomical landmarks, pathological abnormalities, and normal findings (a total of 36 classes) in the gastrointestinal (GI) tract. The dataset comprises 6,169 images acquired at two centers (Bærum Hospital in Norway and Karolinska University Hospital, Stockholm, Sweden) and was annotated by experienced GI endoscopists. Preliminary computational diagnostic results with baseline deep learning models are presented. We validate the significance of our dataset for GI anomaly detection with extensive benchmarking. The GastroVision dataset can bring considerable benefit in developing AI-based algorithms and can help unlock the potential of automated systems in GI disease detection. The dataset will be available at https://github.com/Anonymous/Gastrovision.