This new workshop will bring together interdisciplinary scientists and practitioners working at the intersections of machine learning (ML) to medicine, pathology and biology, for presenting new methods and solutions for healthcare challenges across the full range of multimodal, and often highly heterogeneous and complex patient data, to the wider ICML community. Topics of interest include, but are not limited to: Multimodal fusion and learning in medical imaging, digital pathology, computational biology, genetics, electronic healthcare records; Multimodal biomarkers for early prediction of disease onset, therapeutic response or disease recurrence; Benchmarking, domain shifts, and generalization of ML in multimodal healthcare data; ML for dealing with inherent sparsity, incompleteness and complexity of multimodal healthcare data; ML for ensuring fairness and reducing bias in healthcare applications; ML for privacy preservation in healthcare data; Co-creation and human-in-the-loop for ML in healthcare.
Sat 12:00 p.m. - 12:10 p.m.
|
ICML ML4MHD Workshop
(
Opening
)
SlidesLive Video » This new workshop will bring together interdisciplinary scientists and practitioners working at the intersections of machine learning (ML) to medicine, pathology and biology, for presenting new methods and solutions for healthcare challenges across the full range of multimodal, and often highly heterogeneous and complex patient data, to the wider ICML community. Topics of interest include, but are not limited to: • Multimodal fusion and learning in medical imaging, digital pathology, computational biology, genetics, electronic healthcare records. • Multimodal biomarkers for early prediction of disease onset, therapeutic response or disease recurrence. • Benchmarking, domain shifts, and generalization of ML in multimodal healthcare data. • ML for dealing with inherent sparsity, incompleteness and complexity of multimodal healthcare data. • ML for ensuring fairness and reducing bias in healthcare applications. • ML for privacy preservation in healthcare data. • Co-creation and human-in-the-loop for ML in healthcare. |
Julia Schnabel · Andreas Maier · Pallavi Tiwari · Oliver Stegle 🔗 |
Sat 12:10 p.m. - 12:40 p.m.
|
Learning and using a multimodal single cell atlas
(
Keynote - Fabian Theis (prerecorded)
)
SlidesLive Video » |
Julia Schnabel · Fabian Theis 🔗 |
Sat 12:50 p.m. - 1:00 p.m.
|
Semi-supervised Cooperative Learning for Multiomics Data Fusion
(
Oral
)
SlidesLive Video » Multiomics data fusion integrates diverse data modalities, ranging from transcriptomics to proteomics, to gain a comprehensive understanding of biological systems and enhance predictions on outcomes of interest related to disease phenotypes and treatment responses. Cooperative learning, a recently proposed method, unifies the commonly-used fusion approaches, including early and late fusion, and offers a systematic framework for leveraging the shared underlying relationships across omics to strengthen signals. However, the challenge of acquiring large-scale labeled data remains, and there are cases where multiomics data are available but in the absence of annotated labels. To harness the potential of unlabeled multiomcis data, we introduce semi-supervised cooperative learning. By utilizing an ``agreement penalty", our method incorporates the additional unlabeled data in the learning process and achieves consistently superior predictive performance on simulated data and a real multiomics study of aging. It offers an effective solution to multiomics data fusion in settings with both labeled and unlabeled data and maximizes the utility of available data resources, with the potential of significantly improving predictive models for diagnostics and therapeutics in an increasingly multiomics world. |
Daisy Yi Ding · Xiaotao Shen · Michael Snyder · Rob Tibshirani 🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee
|
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
The pulse of ethical machine learning and health
(
Keynote - Marzyeh Ghassemi (prerecording)
)
SlidesLive Video » |
Marzyeh Ghassemi 🔗 |
Sat 2:00 p.m. - 2:10 p.m.
|
Speed-of-Sound Mapping for Pulse-Echo Ultrasound Raw Data using Linked-Autoencoders
(
Oral (live on zoom)
)
SlidesLive Video »
Recent studies showed the possibility of extracting SoS information from pulse-echo ultrasound raw data (a.k.a. RF data) using deep neural networks that are fully trained on simulated data.These methods take sensor domain data, i.e., RF data, as input and train a network in an end-to-end fashion to learn the implicit mapping between the RF data domain and SoS domain. However, such networks are prone to overfitting to simulated data which results in poor performance and instability when tested on measured data. We propose a novel method for SoS mapping employing learned representations from two linked autoencoders. We test our approach on simulated and measured data acquired from human breast mimicking phantoms. We show that SoS mapping is possible using the learned representations by linked autoencoders. The proposed method has a Mean Absolute Percentage Error (MAPE) of $2.39\%$ on the simulated data.On the measured data, the predictions of the proposed method are close to the expected values (MAPE of 1.1%).Compared to an end-to-end trained network, the proposed method shows higher stability and reproducibility.
|
Farnaz Khun Jush · Peter M. Dueppenbecker · Andreas Maier 🔗 |
Sat 2:10 p.m. - 2:20 p.m.
|
HOOREX: Higher Order Optimizers for 3D Recovery from X-Ray Images
(
Oral (prerecorded)
)
SlidesLive Video » We propose a method to address the challenge of generating a 3D digital twin of a patient during an X-ray guided medical procedure from a single 2D X-ray projection image, a problem that is inherently ill-posed. To tackle this issue, we aim to infer the parameters of Bones, Organs and Skin Shape (BOSS) model, a deformable human shape and pose model. There are currently two main approaches for model-based estimation. Optimization-based methods try to iteratively fit a body model to 2D measurements, they produce accurate 2D alignments but are slow and sensitive to initialization. On the other hand, regression-based methods use neural networks to estimate the model parameters directly, resulting in faster predictions but often with misalignments. Our approach combines the benefits of both techniques by implementing a fully differentiable paradigm through the use of higher-order optimizers that only require the Jacobian, which can be determined implicitly. The network was trained on synthetic CT and real CBCT image data, ensuring view independence. We demonstrate the potential clinical applicability of our method by validating it on multiple datasets covering diverse anatomical regions, and achieving an error of 27.98 mm. |
Karthik Shetty · Annette Birkhold · Bernhard Egger · Srikrishna Jaganathan · Norbert Strobel · Markus Kowarschik · Andreas Maier 🔗 |
Sat 2:20 p.m. - 2:30 p.m.
|
Neural Graph Revealers
(
Oral
)
SlidesLive Video » Sparse graph recovery methods work well where the data follows their assumptions but often they are not designed for doing downstream probabilistic queries. This limits their adoption to only identifying connections among domain variables. On the other hand, the Probabilistic Graphical Models (PGMs) learn an underlying base graph between variables together with a distribution over them. PGM design choices are carefully made such that the inference & sampling algorithms are efficient. This brings in certain restrictions and often simplifying assumptions. In this work, we propose Neural Graph Revealers (NGRs), that are an attempt to efficiently merge the sparse graph recovery methods with PGMs into a single flow. The problem setting consists of an input data X with D features and M samples and the task is to recover a sparse graph showing connections between the features and learn a probability distribution over features D at the same time. NGRs view the neural networks as a `glass box' or more specifically as a multitask learning framework. We introduce 'graph-constrained path norm' that NGRs leverage to learn a graphical model that captures complex non-linear functional dependencies between features in the form of an undirected sparse graph. Furthermore, NGRs can handle multimodal inputs like images, text, categorical data, embeddings etc. which is not straightforward to incorporate in the existing methods. We show experimental results on data from Gaussian graphical models and a multimodal infant mortality dataset by CDC. |
Harsh Shrivastava · Urszula Chajewska 🔗 |
Sat 2:30 p.m. - 2:40 p.m.
|
SIM-CNN: Self-Supervised Individualized Multimodal Learning for Stress Prediction on Nurses Using Biosignals
(
Oral
)
SlidesLive Video » Precise stress recognition from biosignals is inherently challenging due to the heterogeneous nature of stress, individual physiological differences, and scarcity of labeled data. To address these issues, we developed SIM-CNN, a self-supervised learning (SSL) method for personalized stress-recognition models using multimodal biosignals. SIM-CNN involves training a multimodal 1D convolutional neural network (CNN) that leverages SSL to utilize massive unlabeled data, optimizing individual parameters and hyperparameters for precision health. SIM-CNN is evaluated on a real-world multimodal dataset collected from nurses that consists of 1,250 hours of biosignals, 83 hours of which are explicitly labeled with stress levels. SIM-CNN is pre-trained on the unlabeled biosignal data with next-step time series forecasting and fine-tuned on the labeled data for stress classification. Compared to SVMs and baseline CNNs with an identical architecture but without self-supervised pre-training, SIM-CNN shows clear improvements in the average AUC and accuracy, but a further examination of the data also suggests some intrinsic limitations of patient-specific stress recognition using biosignals recorded in the wild. |
Sunmin Eom · Sunwoo Eom · Peter Washington 🔗 |
Sat 2:40 p.m. - 2:50 p.m.
|
Death Prediction by Race in Colorectal Cancer Patients Using Machine Learning Approaches
(
Oral
)
SlidesLive Video » Cancer (CRC) cases have increased worldwide. In USA, African Americans have a higher incidence than other races. In this paper, we aimed to use ML to study specific factors or variables affecting the high incidence of CRC mortality by race after receiving treatments and create models to predict death. We used metastatic CRC Genes Sequencing Studies as data. The patient’s inclusion was based on receiving chemotherapy and grouped by race (White-American and African-American). Five supervised ML methods were implemented for creating model predictions and a Mini-Batched-Normalized-Mutual-Information-Hybrid-Feature-Selection method to extract features including more than 25,000 genes. As a result, the best model was obtained with the Classification-Regression-Trees algorithm (AUC-ROC= 0.91 for White-American, AUC-ROC=0.89 for African Americans). The features "DBNL gene", "PIN1P1 gene" and "Days-from-birth" were the most significant variables associated with CRC mortality for White-American, while "IFI44L-gene", "ART4-gene" and "Sex" were the most relevant related to African-American. In conclusion, these features and models are promising for further analysis and decision-making tools to study CRC from a precision medicine perspective for minority health. |
Abiel Roche-Lima · Frances Aponte · Frances Heredia Negron · Brenda Nieves-Rodriguez 🔗 |
Sat 2:50 p.m. - 3:00 p.m.
|
InterSynth: a semi-synthetic framework for benchmarking prescriptive inference from observational data
(
Oral
)
SlidesLive Video » Treatments are prescribed to individuals in pursuit of contemporaneously unobserved outcomes, based on evidence derived from populations with historically observed treatments and outcomes. Since neither treatments nor outcomes are typically replicable in the same individual, alternatives remain counterfactual in both settings. Prescriptive fidelity therefore cannot be evaluated empirically at the individual-level, forcing reliance on lossy, group-level estimates, such as average treatment effects, that presume an implausibly low ceiling on individuation. The lack of empirical ground truths critically impedes the development of individualised prescriptive models, on which realising personalised care inevitably depends. Here we present InterSynth, a general platform for modelling biologically-plausible, empirically-informed, semi-synthetic ground truths, for the evaluation of prescriptive models operating at the individual level. InterSynth permits comprehensive simulation of heterogeneous treatment effect sizes and variability, and observed and unobserved confounding treatment allocation biases, with explicit modelling of decoupled response failure and spontaneous recovery. Operable with high-dimensional data such as high-resolution brain lesion maps, InterSynth offers a principled means of quantifying the fidelity of prescriptive models across a wide range of plausible real-world conditions. We demonstrate end-to-end use of the platform with an example employing real neuroimaging data from patients with ischaemic stroke, volume image-based succinct lesion representations, and semi-synthetic ground truths informed by functional, transcriptomic and receptomic data. We make our platform freely available to the scientific community. |
Dominic Giles · Robert Gray · Chris Foulon · Guilherme Pombo · Tianbo Xu · James K Ruffle · Rolf Jäger · Jorge Cardoso · Sebastien Ourselin · Geraint Rees · Ashwani Jha · Parashkev Nachev
|
Sat 3:00 p.m. - 4:30 p.m.
|
Lunch (not provided)
|
🔗 |
Sat 4:30 p.m. - 5:00 p.m.
|
Charting the Course: A Deep Dive into the Evolution and Future Trajectory of Multimodal AI in Radiology
(
Keynote - Judy Gichoya (prerecroded)
)
SlidesLive Video » |
Judy Wawira 🔗 |
Sat 5:00 p.m. - 5:10 p.m.
|
Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging
(
Oral (prerecorded)
)
SlidesLive Video » Self-supervised learning is crucial for clinical imaging applications, given the lack of explicit labels in healthcare. However, conventional approaches that rely on precise vision-language alignment are not always feasible in complex clinical imaging modalities, such as cardiac magnetic resonance (CMR). CMR provides a comprehensive visualization of cardiac anatomy, physiology, and microstructure, making it challenging to interpret. Additionally, CMR reports require synthesizing information from sequences of images and different views, resulting in potentially weak alignment between the study and diagnosis report pair.To overcome these challenges, we propose \textbf{CMRformer}, a multimodal learning framework to jointly learn sequences of CMR images and associated cardiologist's reports. Moreover, one of the major obstacles to improving CMR study is the lack of large, publicly available datasets. To bridge this gap, we collected a large \textbf{CMR dataset}, which consists of 13,787 studies from clinical cases. By utilizing our proposed CMRformer and our collected dataset, we achieved remarkable performance in real-world clinical tasks, such as CMR image retrieval and diagnosis report retrieval. Furthermore, the learned representations are evaluated to be practically helpful for downstream applications, such as disease classification. Our work could potentially expedite progress in the CMR study and lead to more accurate and effective diagnosis and treatment. |
Jielin Qiu · Peide Huang · Makiya Nakashima · Jaehyun Lee · Jiacheng Zhu · Wilson Tang · Pohao Chen · Christopher Nguyen · Byung-Hak Kim · Debbie Kwon · Douglas Weber · Ding Zhao · David Chen
|
Sat 5:10 p.m. - 5:20 p.m.
|
Prompt-based Generative Replay: A Text-to-Image Approach for Continual Learning in Medical Settings
(
Oral
)
SlidesLive Video » Episodic replay methods, which store and replay past data, have been effective in handling distribution shifts in continual learning. However, due to regulatory and privacy concerns for data sharing, their applicability can be limited. In this work, we introduce two novel healthcare benchmarks for domain incremental continual learning: diabetic retinopathy severity classification and dermoscopy skin lesion detection, and highlight issues of poor forward and backward transferability in simple baselines. To overcome these challenges, we propose a novel method called prompt-based generative replay. By leveraging a text-to-image diffusion model for synthetic data generation, our approach effectively preserves previously learned knowledge while adapting to new data distributions. Our experiments demonstratethat our prompt-based generative replay significantly outperforms competitive baselines, resulting in an average increase of up to 5 points in average AUC for the skin lesions benchmark and up to 2 points for the diabetic retinopathy benchmark. |
Yewon Byun · Saurabh Garg · Sanket Vaibhav Mehta · Jayashree Kalpathy-Cramer · Praveer Singh · Bryan Wilder · Zachary Lipton 🔗 |
Sat 5:20 p.m. - 5:30 p.m.
|
Exploiting Partial Common Information Microstructure for Multi-Modal Brain Tumor Segmentation
(
Oral
)
SlidesLive Video » Learning with multiple modalities is crucial for automated brain tumor segmentation from magnetic resonance imaging data. Explicitly optimizing the common information shared among all modalities (e.g., by maximizing the total correlation) has been shown to achieve better feature representations and thus enhance the segmentation performance. However, existing approaches are oblivious to partial common information shared by subsets of the modalities. In this paper, we show that identifying such partial common information can significantly boost the discriminative power of image segmentation models. In particular, we introduce a novel concept of partial common information mask (PCI-mask) to provide a fine-grained characterization of what partial common information is shared by which subsets of the modalities. By solving a masked correlation maximization and simultaneously learning an optimal PCI-mask, we identify the latent microstructure of partial common information and leverage it in a self-attention module to selectively weight different feature representations in multi-modal data. We implement our proposed framework on the standard U-Net. Our experimental results on the Multi-modal Brain Tumor Segmentation Challenge (BraTS) datasets consistently outperform those of state-of-the-art segmentation baselines, with validation Dice similarity coefficients of 0.920, 0.897, 0.837 for the whole tumor, tumor core, and enhancing tumor on BraTS-2020. |
Yongsheng Mei · Tian Lan · Guru Venkataramani 🔗 |
Sat 5:30 p.m. - 5:40 p.m.
|
RobustSsF: Robust Missing Modality Brain Tumor Segmentation with Self-supervised Learning-based Scenario-specific Fusion
(
Oral
)
SlidesLive Video » All modalities of Magnetic Resonance Imaging (MRI) have an essential role in diagnosing brain tumors, but there are some challenges posed by missing or incomplete modalities in multimodal MRI. Existing models have failed to achieve robust performance across all scenarios. To address this issue, this paper proposes a novel 4encoder-4decoder architecture that incorporates both "dedicated" and "single" models. Our model named SsFnL includes multiple Scenario-specific Fusion (SsF) decoders that construct different features depending on the missing modality scenarios. To train this, we introduce novel self-supervised learning and Couple Regularization loss function (CReg) to achieve robust learning and the Lifelong Learning Strategy (LLS) to enhance model performance. The experimental results on BraTS2018 demonstrate that SsFnL successfully constructs the most robust model, achieving state-of-the-art results in TC and ET sub-regions when T1ce is missing, and in other challenging scenarios and sub-regions. |
Jeongwon Lee · Daeshik Kim 🔗 |
Sat 5:40 p.m. - 5:50 p.m.
|
Interpretable and Intervenable Ultrasonography-based Machine Learning Models for Pediatric Appendicitis
(
Oral
)
SlidesLive Video » Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. With recent advances in machine learning, data-driven decision support could help clinicians diagnose and manage patients while reducing the number of non-critical surgeries. However, previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored the use of abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. To this end, our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts that are understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. |
Julia Vogt · Ricards Marcinkevics · Patricia Reis Wolfertstetter · Ugne Klimiene · Kieran Chin-Cheong · Alyssia Paschke · Julia Zerres · Markus Denzinger · David Niederberger · Sven Wellmann · Ece Ozkan · Christian Knorr
|
Sat 5:50 p.m. - 6:00 p.m.
|
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection
(
Oral
)
SlidesLive Video » Artificial intelligence (AI) systems in different medical applications have gained enormous popularity. However, their limited scalability and acceptance in real-time clinical practices are attributed to several factors, such as biased outcomes, transparency, and under-performance on unseen data. The lack of large-scale, precisely labeled, diverse data is a major reason for these drawbacks. Such datasets are sparsely available due to the legal restrictions and manual efforts required for extensive annotation with medical expertise. In this work, we present GastroVision, an open-access endoscopy dataset with the largest number of different anatomical landmarks, pathological abnormalities, and normal findings (a total of 36 classes) in the gastrointestinal (GI) tract. The dataset comprises 6,169 images acquired at two centers (Bærum Hospital in Norway and Karolinska University Hospital, Stockholm, Sweden) and was annotated by experienced GI endoscopists. Preliminary computational diagnostic results with baseline deep learning models are presented. We validate the significance of our dataset for GI anomaly detection with extensive benchmarking. The GastroVision dataset can bring considerable benefit in developing AI-based algorithms and can help unlock the potential of automated systems in GI disease detection. The dataset will be available at https://github.com/Anonymous/Gastrovision. |
Ulas Bagci · Debesh Jha · Vanshali Sharma · Neethi Dasu · Nikhil Tomar · Steven Hicks · Pradip Das · M Bhuyan · Michael Riegler · Pål Halvorsen · Thomas de Lange
|
Sat 6:00 p.m. - 6:30 p.m.
|
Coffee
|
🔗 |
Sat 6:30 p.m. - 6:40 p.m.
|
Can Brain Signals Reveal Inner Alignment with Human Languages?
(
Oral (prerecorded)
)
SlidesLive Video » Brain Signals, such as Electroencephalography (EEG), and human languages have been widely explored independently for many downstream tasks, however, the connection between them has not been well explored. In this study, we explore the relationship and dependency between EEG and language. To study at the representation level, we introduced \textbf{MTAM}, a \textbf{M}ultimodal \textbf{T}ransformer \textbf{A}lignment \textbf{M}odel, to observe coordinated representations between the two modalities. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure features. On downstream applications, sentiment analysis and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 16.5\% on K-EmoCon and 27\% on Zuco datasets for sentiment analysis, and 31.1\% on ZuCo for relation detection. In addition, we provide interpretations of the performance improvement: (1) feature distribution shows the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) alignment weights show the influence of different language semantics as well as EEG frequency features; (3) brain topographical maps provide an intuitive demonstration of the connectivity in the brain regions. Our anonymous code is available at \url{https://anonymous.4open.science/r/ICML-109F/}. |
Jielin Qiu · William Han · Jiacheng Zhu · Mengdi Xu · Douglas Weber · Bo Li · Ding Zhao 🔗 |
Sat 6:40 p.m. - 6:50 p.m.
|
Latent Masking for Multimodal Self-supervised Learning in Health Timeseries
(
Oral (prerecorded)
)
SlidesLive Video » Limited availability of labeled data for machine learning on biomedical time-series hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without labels. However, current SSL methods require expensive computations for negative pairs and are designed for single modalities, limiting their versatility. To overcome these limitations, we introduce CroSSL (Cross-modal SSL). CroSSL introduces two novel concepts: masking intermediate embeddings from modality-specific encoders and aggregating them into a global embedding using a cross-modal aggregator. This enables the handling of missing modalities and end-to-end learning of cross-modal patterns without prior data preprocessing or time-consuming negative-pair sampling. We evaluate CroSSL on various multi-modal time-series benchmarks, including both medical-grade and consumer biosignals. Our results demonstrate superior performance compared to previous SSL techniques and supervised benchmarks with minimal labeled data. We additionally analyze the impact of different masking ratios and strategies and assess the robustness of the learned representations to missing modalities. Overall, our work achieves state-of-the-art performance while highlighting the benefits of masking latent embeddings for cross-modal learning in temporal health data. |
Shohreh Deldari · Dimitrios Spathis · Mohammad Malekzadeh · Fahim Kawsar · Flora Salim · Akhil Mathur 🔗 |
Sat 6:50 p.m. - 7:00 p.m.
|
Multi-Modal Biomarker Extraction Framework for Therapy Monitoring of Social Anxiety and Depression Using Audio and Video
(
Oral
)
SlidesLive Video » This paper introduces a framework that can be used for feature extraction, relevant to monitoring the speech therapy progress of individuals suffering from social anxiety or depression. It operates multi-modal (decision fusion) by incorporating audio and video recordings of a patient and the corresponding interviewer, at two separate test assessment sessions.The used data is provided by an ongoing project at a medical institution in Germany, with the goal of investigating whether an established speech therapy group program for adolescents, which is implemented in a stationary and semi-stationary setting, can be successfully carried out through telemedicine. The features proposed in this multi-modal approach could form the basis for interpretation and analysis by medical experts and therapists, in addition to acquired data in the form of questionaries.Extracted audio features focus on prosody (intonation, stress, rhythm, and timing), as well as predictions from a deep neural network model, which is inspired by the Pleasure, Arousal, Dominance (PAD) emotional model space. Video features are based on a pipeline that is designed to enable visualization of the interaction between the patient and the interviewer in terms of Facial Emotion Recognition (FER), utilizing the mini-Xception network architecture. |
Paula Andrea Pérez-Toro · Tobias Weise · Andrea Deitermann · Bettina Hoffmann · Kubilay Demir · Theresa Straetz · Elmar Noeth · Andreas Maier · Thomas Kallert · Seung Hee Yang 🔗 |
Sat 7:00 p.m. - 7:10 p.m.
|
Multimodal LLMs for health grounded in individual-specific data
(
Oral
)
SlidesLive Video » Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual’s health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM’s token embedding space and for simple modalities like tabular data by serializing the data into text. Using data fromthe UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out of distribution traits and its ability to power conversations around individual health and wellness. |
Justin Cosentino · Anastasiya Belyaeva · Farhad Hormozdiari · Cory McLean · nicholas furlotte 🔗 |
Sat 7:10 p.m. - 7:20 p.m.
|
MaxCorrMGNN: A Multi-Graph Neural Framework for Generalized Multimodal Fusion of Medical Data for Outcome Prediction
(
Oral
)
SlidesLive Video » With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorrMGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Re`nyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques. |
Niharika D'Souza · Hongzhi Wang · Andrea Giovannini · Antonio Foncubierta-Rodríguez · Kristen Beck · Orest Boyko · Tanveer Syeda-Mahmood 🔗 |
Sat 7:20 p.m. - 8:00 p.m.
|
Panel and Closing
(
Discussion panel
)
SlidesLive Video » |
Julia Schnabel · Andreas Maier · Pallavi Tiwari · Oliver Stegle · Daniel Rueckert · Ulas Bagci · Xiaoxiao Li 🔗 |