Timezone: »
Applying machine learning (ML) in healthcare is gaining momentum rapidly. However, the black-box characteristics of the existing ML approach inevitably lead to less interpretability and verifiability in making clinical predictions. To enhance the interpretability of medical intelligence, it becomes critical to develop methodologies to explain predictions as these systems are pervasively being introduced to the healthcare domain, which requires a higher level of safety and security. Such methodologies would make medical decisions more trustworthy and reliable for physicians, which could ultimately facilitate the deployment. In addition, it is essential to develop more interpretable and transparent ML systems. For instance, by exploiting structured knowledge or prior clinical information, one can design models to learn aspects more aligned with clinical reasoning. Also, it may help mitigate biases in the learning process, or identify more relevant variables for making medical decisions.In this workshop, we aim to bring together researchers in ML, computer vision, healthcare, medicine, NLP, public health, computational biology, biomedical informatics, and clinical fields to facilitate discussions including related challenges, definition, formalisms, and evaluation protocols regarding interpretable medical machine intelligence. Our workshop will be in a large-attendance talk format. The expected number of attendees is about 150. The workshop appeals to ICML audiences as interpretability is a major challenge to deploy ML in critical domains such as healthcare. By providing a platform that fosters potential collaborations and discussions between attendees, we hope the workshop is fruitful in offering a step toward building autonomous clinical decision systems with a higher-level understanding of interpretability.
Fri 12:15 p.m. - 12:30 p.m.
|
Welcoming remarks and introduction
(
Opening remarks
)
SlidesLive Video » |
🔗 |
Fri 12:30 p.m. - 1:00 p.m.
|
Invited talk: Pallavi Tiwari
(
invited talk
)
SlidesLive Video » |
🔗 |
Fri 1:00 p.m. - 1:30 p.m.
|
Invited talk: Jimeng Sun
(
invited talk
)
SlidesLive Video » |
🔗 |
Fri 1:30 p.m. - 1:40 p.m.
|
Poster spotlight #1
(
Poster spotlight
)
SlidesLive Video » |
🔗 |
Fri 1:40 p.m. - 2:10 p.m.
|
Posters I and coffee break
(
poster session
)
|
🔗 |
Fri 2:10 p.m. - 2:40 p.m.
|
Invited talk: Rajesh Ranganath - Have we learned to explain?
(
invited talk
)
SlidesLive Video » Interpretability enriches what can be gleaned from a good predictive model. Techniques that learn-to-explain have arisen because they require only a single evaluation of a model to provide an interpretation. I will discuss a flaw with several methods that learn-to-explain: the optimal explainer makes the prediction rather than highlighting the inputs that are useful for prediction, and I will discuss how to correct this flaw. Along the way, I will develop evaluations grounded in the data and convey why interpretability techniques need to be quantitatively evaluated before their use. References: Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations: https://arxiv.org/pdf/2103.01890.pdf FastSHAP: Real-Time Shapley Value Estimation: https://arxiv.org/pdf/2107.07436.pdf Don't be fooled: label leakage in explanation methods and the importance of their quantitative evaluation: https://arxiv.org/pdf/2302.12893.pdf New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography: https://arxiv.org/pdf/2205.02900.pdf |
🔗 |
Fri 2:40 p.m. - 3:10 p.m.
|
Invited talk: Quanzheng Li
(
invited talk
)
SlidesLive Video » |
🔗 |
Fri 3:10 p.m. - 3:40 p.m.
|
Invited talk: Himabindu Lakkaraju
(
invited talk
)
SlidesLive Video » |
🔗 |
Fri 3:40 p.m. - 4:30 p.m.
|
Lunch break
|
🔗 |
Fri 4:30 p.m. - 5:00 p.m.
|
Invited talk: Irene Chen - Building Equitable Algorithms: Modeling Access to Healthcare in Disease Phenotyping
(
invited talk
)
SlidesLive Video » Advances in machine learning and the explosion of clinical data have demonstrated immense potential to fundamentally improve clinical care and deepen our understanding of human health. However, algorithms for medical interventions and scientific discovery in heterogeneous patient populations are particularly challenged by the complexities of healthcare data. Not only are clinical data noisy, missing, and irregularly sampled, but questions of equity and fairness also raise grave concerns and create additional computational challenges. In this talk, I examine how to incorporate differences in access to care into the modeling step. Using a deep generative model, we examine the task of disease phenotyping in heart failure and Parkinson's disease. The talk concludes with a discussion about how to rethink the entire machine learning pipeline with an ethical lens to building algorithms that serve the entire patient population. |
🔗 |
Fri 5:00 p.m. - 5:30 p.m.
|
Invited talk: Alex Lang - How to get over your Black Box trust issues?
(
invited talk
)
SlidesLive Video » Only the bravest machine learners have dared to tackle problems in medicine. Why? The most important reason is that the end users of ML models in medicine are skeptics of ML, and therefore one must jump through a multitude of hoops in order to deploy ML solutions. The common approach in the field is to focus on interpretability and force our ML solutions to be white box. However, this handcuffs the potential of our ML models from the start, and medicine is already a challenging enough space to model since data is hard to collect, the data one gets is always messy, and the tasks one must achieve in medicine are often not as intuitive as working on images or text. Is there another way? Yes! Our approach is to embrace black box ML solutions, but deploy them carefully in clinical trials by rigorously controlling the risk exposure from trusting the ML solutions. I will use Alzheimer’s disease as an example to dive into our state of the art deep time series neural networks. Once I have explained our black box as best as a human reasonably can, I will detail how the outputs of the deep nets can be used in different clinical trials. In these applications, the end user prespecifies their risk tolerance, which leads to different context of use for the ML models. Our work demonstrates that we can embrace black box solutions by focusing on development rigorous deployment methods. |
🔗 |
Fri 5:30 p.m. - 6:00 p.m.
|
Invited talk: Cihang Xie
(
invited talk
)
SlidesLive Video » |
🔗 |
Fri 6:00 p.m. - 6:20 p.m.
|
Coffee break
(
breat
)
|
🔗 |
Fri 6:20 p.m. - 6:50 p.m.
|
Invited talk: Dr. Judy Gichoya - Titie: Harnessing the ability of AI models to detect hidden signals - how can we explain these findings?
(
invited talk
)
SlidesLive Video » Deep learning models have been demonstrated to have superhuman performance for prediction of features that are not obvious to the human readers. For example, AI can predict the self-reported race of patients, age, sex, diagnosis and insurance of patients. While some of these features are biological, most are social constructs, and given the black box nature of models it remains difficult to assess how this ability is achieved. In this session, we will review some of the approaches that are both technical and non-technical in understanding the performance of these models which has an impact on real world deployment of AI. |
🔗 |
Fri 6:50 p.m. - 7:30 p.m.
|
Poster spotlight #2
(
Poster spotlight
)
SlidesLive Video » |
🔗 |
Fri 7:30 p.m. - 7:35 p.m.
|
Closing remarks
|
🔗 |
Fri 7:35 p.m. - 8:00 p.m.
|
Posters II and coffee break
(
poster
)
|
🔗 |
-
|
Counterfactual Optimization of Treatment Policies Based on Temporal Point Process
(
Poster
)
link »
In high-stakes areas such as healthcare, it is interesting to ask counterfactual questions: what if some executed treatments had been performed earlier/later or changed to other types? Answering such questions can help us debug the observational treatment policies and further improve the treatment strategy. Existing methods mainly focus on generating the whole counterfactual trajectory, which provides overwhelming information and lacks specific feedback on improving certain actions. In this paper, we propose a counterfactual treatment optimization framework where we optimize specific treatment actions by sampling counterfactual symptom rollouts and meanwhile satisfying medical rule constraints. Our method can not only help people debug their specific treatments but also has strong robustness when training data are limited. |
Zilin JING · Chao Yang · Shuang Li 🔗 |
-
|
(Un)reasonable Allure of Ante-hoc Interpretability for High-stakes Domains: Transparency Is Necessary but Insufficient for Explainability
(
Poster
)
link »
Ante-hoc interpretability has become the holy grail of explainable machine learning for high-stakes domains such as healthcare; however, this notion is elusive, lacks a widely-accepted definition and depends on the deployment context. It can refer to predictive models whose structure adheres to domain-specific constraints, or ones that are inherently transparent. The latter notion assumes observers who judge this quality, whereas the former presupposes technical and domain expertise of them, in certain cases rendering such models unintelligible. Additionally, its distinction from the less desirable post-hoc explainability, which refers to methods that construct a separate explanatory model, is vague given that transparent models may still require (post-)processing to generate admissible explanatory insights. Ante-hoc interpretability is thus an overloaded concept that spans a range of implicit properties, which we unpack in this paper to better understand what is needed for its safe deployment across high-stakes domains. To this end, we outline model- and explainer-specific desiderata that allow us to navigate its distinct realisations in view of the envisaged application domain and audience. |
Kacper Sokol · Julia Vogt 🔗 |
-
|
Feature Importance Measurement based on Decision Tree Sampling
(
Poster
)
link »
Random forest are effective for prediction tasks but the randomness of tree generation hinders interpretability in feature importance analysis. To address this, we proposed a SAT-based method for measuring feature importance in tree-based model. Our method has fewer parameters than random forest and provides higher interpretability and stability for the analysis in real-world problems. |
CHAO HUANG · Diptesh Das · Koji Tsuda 🔗 |
-
|
Identifying Inequity in Treatment Allocation
(
Poster
)
link »
Disparities in resource allocation, efficacy of care, and patient outcomes along demographic lines have been documented throughout the healthcare system. In order to reduce such health disparities, it is crucial to quantify uncertainty and biases in the medical decision-making process. In this work, we propose a novel setup to audit inequity in treatment allocation. We develop multiple bounds on the treatment allocation rate, under different strengths of assumptions, which leverage risk estimates via standard classification models. We demonstrate the effectiveness of our approach in assessing racial and ethnic inequity of COVID-19 outpatient Paxlovid allocation. We provably show that for all groups, patients who would die without treatment receive Paxlovid at most 53% of the time, highlighting substantial under-allocation of resources. Furthermore, we illuminate discrepancies between racial subgroups, showing that patients who would die without treatment receive Paxlovid at most 6% and 27 % lower for Blacks than Whites and Asians, respectively. |
Yewon Byun · Dylan Sam · Zachary Lipton · Bryan Wilder 🔗 |
-
|
DBGDGM: A Dynamic Brain Graph Deep Generative Model
(
Poster
)
link »
Graphs are a natural representation of brain activity derived from functional magnetic imaging (fMRI) data. It is well known that communities of nodes extracted from brain graphs, referred to as functional connectivity networks (FCNs), serve as useful biomarkers for understanding brain function and dysfunction. Previous works, however, ignore the temporal dynamics of the brain and focus on static graph representations. In this paper we propose DBGDGM, a dynamic brain graph deep generative model which simultaneously learns graph-, node-, and community-level embeddings in an unsupervised fashion. Specifically, DBGDGM represents brain graph nodes as embeddings sampled from a distribution over communities that evolve over time. The community distribution is parameterized using neural networks that learn from subject and node embed- dings as well as past community assignments. Experiments on real-world fMRI data demonstrate DBGDGM outperforms state-of-the-art baselines in graph generation, dynamic link prediction, and is comparable for graph classification. Finally, an interpretability analysis of the learnt community distributions reveals overlap with known FCNs reported in neuroscience literature. |
Simeon Spasov · Alexander Campbell · Nicola Toschi · Pietro Lió 🔗 |
-
|
DynDepNet: Learning Time-Varying Dependency Structures from fMRI Data via Dynamic Graph Structure Learning
(
Poster
)
link »
Graph neural networks (GNNs) have demonstrated success at learning representations of brain graphs derived from functional magnetic resonance imaging (fMRI) data. The majority of existing GNN methods, however, assume brain graphs are static over time and the graph adjacency matrix is known prior to model training. These assumptions are at odds with neuroscientific evidence that brain graphs are time-varying with a connectivity structure that depends on the choice of functional connectivity measure. Noisy brain graphs that do not truly represent the underling fMRI data can have a detrimental impact on the performance of GNNs. As a solution, we propose DynDepNet, a novel method for learning the optimal time-varying dependency structure of fMRI data induced by a downstream prediction task. Experiments on real-world resting-state as well as task fMRI datasets for the task of biological sex classification demonstrate that DynDepNet achieves state-of-the-art results outperforming the best baseline in terms of accuracy by approximately 8 and 6 percentage points, respectively. Moreover, analysis of the learnt dynamic graphs highlights prediction-related brain regions which align with existing neuroscience literature. |
Alexander Campbell · Antonio Zippo · Luca Passamonti · Nicola Toschi · Pietro Lió 🔗 |
-
|
DynDepNet: Learning Time-Varying Dependency Structures from fMRI Data via Dynamic Graph Structure Learning
(
Oral
)
link »
Graph neural networks (GNNs) have demonstrated success at learning representations of brain graphs derived from functional magnetic resonance imaging (fMRI) data. The majority of existing GNN methods, however, assume brain graphs are static over time and the graph adjacency matrix is known prior to model training. These assumptions are at odds with neuroscientific evidence that brain graphs are time-varying with a connectivity structure that depends on the choice of functional connectivity measure. Noisy brain graphs that do not truly represent the underling fMRI data can have a detrimental impact on the performance of GNNs. As a solution, we propose DynDepNet, a novel method for learning the optimal time-varying dependency structure of fMRI data induced by a downstream prediction task. Experiments on real-world resting-state as well as task fMRI datasets for the task of biological sex classification demonstrate that DynDepNet achieves state-of-the-art results outperforming the best baseline in terms of accuracy by approximately 8 and 6 percentage points, respectively. Moreover, analysis of the learnt dynamic graphs highlights prediction-related brain regions which align with existing neuroscience literature. |
Alexander Campbell · Antonio Zippo · Luca Passamonti · Nicola Toschi · Pietro Lió 🔗 |
-
|
A Survey on Knowledge Graphs for Healthcare: Resources, Application Progress, and Promise
(
Poster
)
link »
Healthcare knowledge graphs (HKGs) have emerged as a promising tool for organizing medical knowledge in a structured and interpretable way, which provides a comprehensive view of medical concepts and their relationships. However, challenges such as data heterogeneity and limited coverage remain, emphasizing the need for further research in the field of HKGs. This survey paper serves as the first comprehensive overview of HKGs. We summarize the pipeline and key techniques for HKG construction (i.e., from scratch and through integration), as well as the common utilization approaches (i.e., model-free and model-based). To provide researchers with valuable resources, we organize existing HKGs based on the data types they capture and application domains, supplemented with pertinent statistical information. In the application section, we delve into the transformative impact of HKGs across various healthcare domains, spanning from fine-grained basic science research to high-level clinical decision support. Lastly, we shed light on the opportunities for creating comprehensive and accurate HKGs in the era of large language models, presenting the potential to revolutionize healthcare delivery and enhance the interpretability and reliability of clinical prediction. |
Hejie Cui · Jiaying Lu · Shiyu Wang · Ran Xu · Wenjing Ma · Shaojun Yu · Yue Yu · Xuan Kan · Tianfan Fu · Chen Ling · Joyce Ho · Fei Wang · Carl Yang
|
-
|
A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis
(
Poster
)
link »
Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-language models (VLMs) such as CLIP have shown great performance for zero-shot natural image recognition and exhibit benefits in medical applications. However, an explainable zero-shot medical image recognition framework with promising performance is yet under development. In this paper, we propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis, mimicking the diagnostic process performed by human experts. The key idea is to query large language models (LLMs) with category names to automatically generate additional cues and knowledge, such as disease symptoms or descriptions other than a single category name, to help provide more accurate and explainable diagnosis in CLIP. We further design specific prompts to enhance the quality of generated texts by ChatGPT that describe visual medical features. Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline, corroborating the great potential of VLMs and LLMs for medical applications. |
Jiaxiang Liu · Tianxiang Hu · Yan Zhang · Xiaotang Gai · YANG FENG · Zuozhu Liu 🔗 |
-
|
A Pipeline for Interpretable Clinical Subtyping with Deep Metric Learning
(
Poster
)
link »
Clinical subtyping, a critical component of personalized medicine, classifies patients with a particular disease into distinct subgroups based on their unique features. However, conventional data-driven subtyping approaches often entail a manual characterization of the identified clusters, complicating the task due to the high dimensionality and heterogeneity of the data. In this work, we propose a novel framework for interpretable clinical subtyping using deep metric learning. Our proposed pipeline unifies prior approaches to clinical subtyping, and introduces automatic characterization of the learned clusters in an interpretable and clinically meaningful manner. We showcase the effectiveness of this framework on real-world clinical case studies, demonstrating its utility in uncovering actionable clinical knowledge. |
Haoran Zhang · Qixuan Jin · Thomas Hartvigsen · Miriam Udler · Marzyeh Ghassemi 🔗 |
-
|
A Pipeline for Interpretable Clinical Subtyping with Deep Metric Learning
(
Oral
)
link »
Clinical subtyping, a critical component of personalized medicine, classifies patients with a particular disease into distinct subgroups based on their unique features. However, conventional data-driven subtyping approaches often entail a manual characterization of the identified clusters, complicating the task due to the high dimensionality and heterogeneity of the data. In this work, we propose a novel framework for interpretable clinical subtyping using deep metric learning. Our proposed pipeline unifies prior approaches to clinical subtyping, and introduces automatic characterization of the learned clusters in an interpretable and clinically meaningful manner. We showcase the effectiveness of this framework on real-world clinical case studies, demonstrating its utility in uncovering actionable clinical knowledge. |
Haoran Zhang · Qixuan Jin · Thomas Hartvigsen · Miriam Udler · Marzyeh Ghassemi 🔗 |
-
|
Signature Activation: A Sparse Signal View for Holistic Saliency
(
Poster
)
link »
The adoption of machine learning in healthcare calls for model transparency and explainability. In this work, we introduce Signature Activation, a saliency method that generates holistic and class-agnostic explanations for Convolutional Neural Networks' outputs. We exploit the sparsity of images and give theoretical explanation to justify our methods. We show the potential use of our method in clinical settings through evaluating its efficacy for aiding the detection of lesions in Coronary Angiorams. |
Jose Tello Ayala · Akl Fahed · Weiwei Pan · Eugene Pomerantsev · Patrick Ellinor · Anthony Philippakis · Finale Doshi-Velez 🔗 |
-
|
Signature Activation: A Sparse Signal View for Holistic Saliency
(
Oral
)
link »
The adoption of machine learning in healthcare calls for model transparency and explainability. In this work, we introduce Signature Activation, a saliency method that generates holistic and class-agnostic explanations for Convolutional Neural Networks' outputs. We exploit the sparsity of images and give theoretical explanation to justify our methods. We show the potential use of our method in clinical settings through evaluating its efficacy for aiding the detection of lesions in Coronary Angiorams. |
Jose Tello Ayala · Akl Fahed · Weiwei Pan · Eugene Pomerantsev · Patrick Ellinor · Anthony Philippakis · Finale Doshi-Velez 🔗 |
-
|
Explanation-guided dynamic feature selection for medical risk prediction
(
Poster
)
link »
In medical risk prediction scenarios, machine learning methods have demonstrated an ability to learn complex and predictive relationships among rich feature sets. However, in practice, when faced with new patients, we may not have access all information expected by a trained risk model. We propose a framework to simultaneously provide flexible risk estimates for samples with missing features, as well as context-dependent feature recommendations to identify what piece of information may be most valuable to collect next. Our approach uses a fixed prediction model, a local feature explainer, and ensembles of imputed samples to generate risk prediction intervals and feature recommendations. Applied to a myocardial infarction risk prediction task in the UK Biobank dataset, we find that our approach can more efficiently predict risk of a heart attack with fewer observed features than traditional fixed imputation and global feature selection methods. |
Nicasia Beebe-Wang · Wei Qiu · Su-In Lee 🔗 |
-
|
Rethinking Medical Report Generation: Disease Revealing Enhancement with Knowledge Graph
(
Poster
)
link »
Knowledge graph (KG) is an important component in medical report generation because it can reveal the relations among diseases and thus is often utilized during the generation process. However, the collection of a comprehensive KG is time-consuming and its usage is under-explored. In this paper, we construct a complete KG on chest images that includes most types of diseases and/or abnormalities.We further explore the usage of a KG in different directions. Firstly, by designing a rule-based criterion to classify disease types at sentence level, we find that long-tailed problems exist in the disease distribution and that generated reports from current advanced methods are far from being clinically useful. We alleviate the long-tailed distribution problem through a new augmentation strategy that increases the disease types in the tailed distribution. A two-stage generation approach based on image-level classification result is proposed in parallel to better capture ``disease-specific" information. On the other side, radiologists evaluate generated reports on whether they describe the diseases appearing in the input image. Following this idea, we propose diverse sensitivity (DS), a new metric that checks whether generated diseases match ground-truth and measures the diversity of all generated diseases. We observe that current leading methods cannot generate satisfying results and the proposed two-stage generation framework and augmentation strategies improve DS by a considerable margin. |
Yixin Wang · Zihao Lin · Haoyu Dong 🔗 |
-
|
ADMIRE++: Explainable Anomaly Detection in the Human Brain via Inductive Learning on Temporal Multiplex Networks
(
Poster
)
link »
Understanding the human brain is an intriguing goal for neuroscience research. Due to recent advances in machine learning on graphs, representing the connections of the human brain as a network has become one of the most pervasive analytical paradigms. However, most existing graph machine learning-based methods suffer from a subset of three critical limitations: They are (1) designed for one type of data (e.g., fMRI or sMRI) and one individual subject, limiting their ability to use complementary information provided by different images, (2) designed in supervised or transductive settings, limiting their generalizability to unseen patterns, (3) blackbox models, designed for classifying brain networks, limiting their ability to reveal underlying patterns that might cause the symptoms of a disease or disorder. To address these limitations, we present ADMIRE, an inductive and unsupervised anomaly detection method for multimodal brain networks that can detect anomalous patterns in the brains of people living with a disease or disorder. It uses two different casual multiplex walks, inter-view and intra-view, to automatically extract and learn temporal network motifs. It then uses an anonymization strategy to hide node and relation type identities, keeping the model inductive. We then propose a simple, tree-based explainable model, ADMIRE++, to explain ADMIRE predictions. Our experiments on Parkinson’s Disease, Attention Deficit Hyperactivity Disorder, and Autism Spectrum Disorder show the efficiency and effectiveness of our approaches in detecting anomalous brain activity. |
Ali Behrouz · Margo Seltzer 🔗 |
-
|
ADMIRE++: Explainable Anomaly Detection in the Human Brain via Inductive Learning on Temporal Multiplex Networks
(
Oral
)
link »
Understanding the human brain is an intriguing goal for neuroscience research. Due to recent advances in machine learning on graphs, representing the connections of the human brain as a network has become one of the most pervasive analytical paradigms. However, most existing graph machine learning-based methods suffer from a subset of three critical limitations: They are (1) designed for one type of data (e.g., fMRI or sMRI) and one individual subject, limiting their ability to use complementary information provided by different images, (2) designed in supervised or transductive settings, limiting their generalizability to unseen patterns, (3) blackbox models, designed for classifying brain networks, limiting their ability to reveal underlying patterns that might cause the symptoms of a disease or disorder. To address these limitations, we present ADMIRE, an inductive and unsupervised anomaly detection method for multimodal brain networks that can detect anomalous patterns in the brains of people living with a disease or disorder. It uses two different casual multiplex walks, inter-view and intra-view, to automatically extract and learn temporal network motifs. It then uses an anonymization strategy to hide node and relation type identities, keeping the model inductive. We then propose a simple, tree-based explainable model, ADMIRE++, to explain ADMIRE predictions. Our experiments on Parkinson’s Disease, Attention Deficit Hyperactivity Disorder, and Autism Spectrum Disorder show the efficiency and effectiveness of our approaches in detecting anomalous brain activity. |
Ali Behrouz · Margo Seltzer 🔗 |
-
|
Reframing the Brain Age Prediction Problem to a More Interpretable and Quantitative Approach
(
Poster
)
link »
Deep learning models have achieved state-of-the-art results in estimating brain age, which is an important brain health biomarker, from magnetic resonance (MR) images. However, most of these models only provide a global age prediction, and rely on techniques, such as saliency maps, to interpret their results. These saliency maps highlight regions in the input image that were significant for the model's predictions, but they are hard to be interpreted, and saliency map values are not directly comparable across different samples. In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. Our preliminary results indicate that voxel-wise age prediction models are more interpretable, since they provide spatial information about the brain aging process, and they benefit from being quantitative. |
Neha Gianchandani · Mahsa Dibaji · Mariana Bento · Ethan MacDonald · Roberto Souza 🔗 |
-
|
A Unifying Framework to the Analysis of Interaction Methods using Synergy Functions
(
Poster
)
link »
Deep learning is expected to revolutionize many sciences and particularly healthcare and medicine. However, deep neural networks are generally “black box,” which limits their applicability to mission-critical applications in health. Explaining such models would improve transparency and trust in AI-powered decision making and is necessary for understanding other practical needs such as robustness and fairness. A popular means of enhancing model transparency is to quantify how individual inputs contribute to model outputs (called attributions) and the magnitude of interactions between groups of inputs. A growing number of these methods import concepts and results from game theory to produce attributions and interactions. This work presents a unifying framework for game-theory-inspired attribution and k^{\text{th}}-order interaction methods. We show that, given modest assumptions, a \underline{unique} full account of interactions between features, called synergies, is possible in the continuous input setting. We identify how various methods are characterized by their policy of distributing synergies. We establish that gradient-based methods are characterized by their actions on monomials, a type of synergy function, and introduce unique gradient-based methods. We show that the combination of various criteria uniquely defines the attribution/interaction methods. Thus, the community needs to identify goals and contexts when developing and employing attribution and interaction methods. Finally, experiments with Physicochemical Properties of Protein Tertiary Structure data indicate that the proposed method has favorable performance against the state-of-the-art approach. |
Daniel Lundstrom · Ali Ghafelebashi · Meisam Razaviyayn 🔗 |
-
|
A Unifying Framework to the Analysis of Interaction Methods using Synergy Functions
(
Oral
)
link »
Deep learning is expected to revolutionize many sciences and particularly healthcare and medicine. However, deep neural networks are generally “black box,” which limits their applicability to mission-critical applications in health. Explaining such models would improve transparency and trust in AI-powered decision making and is necessary for understanding other practical needs such as robustness and fairness. A popular means of enhancing model transparency is to quantify how individual inputs contribute to model outputs (called attributions) and the magnitude of interactions between groups of inputs. A growing number of these methods import concepts and results from game theory to produce attributions and interactions. This work presents a unifying framework for game-theory-inspired attribution and k^{\text{th}}-order interaction methods. We show that, given modest assumptions, a \underline{unique} full account of interactions between features, called synergies, is possible in the continuous input setting. We identify how various methods are characterized by their policy of distributing synergies. We establish that gradient-based methods are characterized by their actions on monomials, a type of synergy function, and introduce unique gradient-based methods. We show that the combination of various criteria uniquely defines the attribution/interaction methods. Thus, the community needs to identify goals and contexts when developing and employing attribution and interaction methods. Finally, experiments with Physicochemical Properties of Protein Tertiary Structure data indicate that the proposed method has favorable performance against the state-of-the-art approach. |
Daniel Lundstrom · Ali Ghafelebashi · Meisam Razaviyayn 🔗 |
-
|
An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data
(
Poster
)
link »
Synthetic clinical trial data are increasingly being seen as a viable option for research applications when primary data are unavailable. A challenge when applying generative modeling approaches for this purpose is many clinical trial datasets have small sample sizes. In this paper, we present an interpretable data augmentation framework for improving generative models used to produce synthetic clinical trial data. We apply this framework to three clinical trial datasets spanning different disease indications and evaluate the impact of factors such as initial dataset size, generative algorithm, and augmentation scale on metrics used to assess synthetic clinical trial data quality, including fidelity, utility, and privacy. The results indicate that this framework can considerably improve the quality of synthetic data produced using generative algorithms when considering factors of high interest to end users of synthetic clinical trial data. |
Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar 🔗 |
-
|
An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data
(
Oral
)
link »
Synthetic clinical trial data are increasingly being seen as a viable option for research applications when primary data are unavailable. A challenge when applying generative modeling approaches for this purpose is many clinical trial datasets have small sample sizes. In this paper, we present an interpretable data augmentation framework for improving generative models used to produce synthetic clinical trial data. We apply this framework to three clinical trial datasets spanning different disease indications and evaluate the impact of factors such as initial dataset size, generative algorithm, and augmentation scale on metrics used to assess synthetic clinical trial data quality, including fidelity, utility, and privacy. The results indicate that this framework can considerably improve the quality of synthetic data produced using generative algorithms when considering factors of high interest to end users of synthetic clinical trial data. |
Afrah Shafquat · Mandis Beigi · Chufan Gao · Jason Mezey · Jimeng Sun · Jacob Aptekar 🔗 |
-
|
Automated Detection of Interpretable Causal Inference Opportunities: Regression Discontinuity Subgroup Discovery
(
Poster
)
link »
Treatment decisions based on cutoffs of continuous variables, such as the blood sugar threshold for diabetes diagnosis, provide valuable opportunities for causal inference. Regression discontinuities (RDs) are used to analyze such scenarios, where units just above and below the threshold differ only in their treatment assignment status, thus providing as-if randomization. In practice however, implementing RD studies can be difficult as identifying treatment thresholds require considerable domain expertise -- furthermore, the thresholds may differ across population subgroups (e.g., the blood sugar threshold for diabetes may differ across demographics), and ignoring these differences can lower statistical power. Here, we introduce Regression Discontinuity SubGroup Discovery (RDSGD), a machine learning method that identifies more powerful and interpretable subgroups for RD thresholds.Using a claims dataset with over 60 million patients, we apply our method to multiple clinical contexts and identify subgroups with increased compliance to treatment assignment thresholds.As subgroup-specific treatment thresholds are relevant to many diseases, RDSGD can be a powerful tool for discovering new avenues for causal estimation across a range of clinical applications. |
Tong Liu · Patrick Lawlor · Lyle Ungar · Konrad Kording · Rahul Ladhania 🔗 |
-
|
Prospectors: Leveraging Short Contexts to Mine Salient Objects in High-dimensional Imagery
(
Poster
)
link »
High-dimensional imagery consists of high-resolution information required for end-user decision-making. Due to computational constraints, current methods for image-level classification are designed to train with image chunks or down-sampled images rather than with the full high-resolution context. While these methods achieve impressive classification performance, they often lack visual grounding and, thus, the post hoc capability to identify class-specific, salient objects under weak supervision. In this work, we (1) propose a formalized evaluation framework to assess visual grounding in high-dimensional image applications. To present a challenging benchmark, we leverage a real-world segmentation dataset for post hoc mask evaluation. We use this framework to characterize visual grounding of various baseline methods across multiple encoder classes, exploring multiple supervision regimes and architectures (e.g. ResNet, ViT). Finally, we (2) present prospector heads: a novel class of adaptation architectures designed to improve visual grounding. Prospectors leverage chunk heterogeneity to identify salient objects over long ranges and can interface with any image encoder. We find that prospectors outperform baselines by upwards of +6 balanced accuracy points and +30 precision points in a gigapixel pathology setting. Through this experimentation, we also show how prospectors can enable many classes of encoders to identify salient objects without re-training and also demonstrate their improved performance against classical explanation techniques (e.g. Attention maps). |
Gautam Machiraju · Arjun Desai · James Zou · Christopher Re · Parag Mallick 🔗 |
-
|
Bridging the Gap: From Post Hoc Explanations to Inherently Interpretable Models for Medical Imaging
(
Poster
)
link »
ML model design either starts with an interpretable model or a Blackbox (BB) and explains it post hoc. BB models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their BB variants. This paper aims to blur the distinction between a post hoc explanation of a BB and constructing interpretable models. Beginning with a BB, we iteratively \emph{carve out} a mixture of interpretable experts and a \emph{residual network}. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL). We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our approach (1) identifies a diverse set of instance-specific concepts without compromising the performance of the BB, (2) identifies the relatively ``harder'' samples to explain via residuals, and (3) is transferred to an unknown target domain with limited data efficiently. The code is uploaded at \url{https://github.com/AI09-guy/IMLH-submission}. |
Shantanu Ghosh · Ke Yu · Forough Arabshahi · Kayhan Batmanghelich 🔗 |
-
|
Designing optimal tests for slow converging Markov chains
(
Poster
)
link »
We design a Neyman-Pearson test for differentiating between two Markov Chains using a relatively small number of samples compared to the state space size or the mixing time. We assume the transition matrices corresponding to the null and alternative hypothesis are known but the initial distribution is not known. We bound the error using ideas from large deviation theory but in a non-asymptotic setting. As an application, using scRNA-seq data, we design a Neyman-Pearson test for inferring whether a given distribution of RNA expressions from a murine pancreatic tissue sample corresponds to a given transition matrix or not, using only a small number of cell samples. |
Pratik Worah · Clifford Stein 🔗 |
-
|
Interpretable Ensemble-based Deep Learning Approach for Automated Detection of Macular Telangiectasia Type 2 by Optical Coherence Tomography
(
Poster
)
link »
We present an ensemble-based approach using deep learning models for the accurate and interpretable detection of Macular Telangiectasia Type 2 (MacTel) from a large dataset of Optical Coherence Tomography (OCT) scans. Leveraging data from the MacTel Project by the Lowy Medical Research Institute and the University of Washington, our dataset consists of 5200 OCT scans from 780 MacTel patients and 1820 non-MacTel patients. Employing ResNet18 and ResNet50 architectures as supervised learning models along with the AdaBoost algorithm, we predict the presence of MacTel in patients and reflect on interpretability based on the Grad-CAM technique to identify critical regions in OCT images influencing the models' predictions. We propose building weak learners for the AdaBoost ensemble by not only varying the architecture but also varying amounts of labeled data available for training neural networks to improve the accuracy and interpretability. Our study contributes to interpretable machine learning in healthcare, showcasing the efficacy of ensemble techniques for accurate and interpretable detection of rare retinal diseases like MacTel. |
Shahrzad Gholami · Lea Scheppke · Rahul Dodhia · Juan Lavista Ferres · Aaron Lee 🔗 |
-
|
Semi-supervised Ordinal Regression via Cumulative Link Models for Predicting In-Hospital Length-of-Stay
(
Poster
)
link »
Length-of-stay prediction has been widely studied as a classification task (e.g. will patients stay 0-3 days, 3-7 days, or more than 7 days?). Yet previous approaches neglect the natural ordering of these classes: standard multi-class classification treats classes as unordered, while methods that build separate binary classifiers for each class struggle to enforce coherent probabilistic predictions across classes. Instead, we suggest that cumulative link models, an ordinal approach long-known in statistics, is a naturally coherent approach well-suited to length-of-stay classification. We view ordinal regression as an output layer that can be integrated into any training pipeline based on automatic differentiation. We show this output layer yields improved predictions over binary classifier alternatives when paired with either neural net or hidden Markov model representations of patient vital sign history, all while requiring fewer parameters. Further experiments show promise in a semi-supervised setting, where only some patients have observed outcomes. |
Alexander Lobo · Preetish Rath · Michael Hughes 🔗 |
-
|
Learning where to intervene with a differentiable top-k operator: Towards data-driven strategies to prevent fatal opioid overdoses
(
Poster
)
link »
Public health organizations need to decide how best to prioritize and target interventions in the most effective manner, given many candidate locations but a limited budget. We consider learning from historical opioid overdose events to predict where to intervene among many candidate spatial regions. Recent work has suggested performance metrics that grade models by how well they recommend a top-K set of regions, computing in hindsight the fraction of events in the actual top-K regions that are covered by the recommendation. We show how to directly optimize such metrics, using advances in perturbed optimizers that allow end-to-end gradient-based training. Experiments suggest that on real opioid-related overdose events from 1620 census tracts in Massachusetts, our end-to-end neural approach selects 100 tracts for intervention better than purpose-built statistical models and tough-to-beat historical baselines. |
Kyle Heuton · Shikhar Shrestha · Thomas Stopka · Michael Hughes 🔗 |
-
|
Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability
(
Poster
)
link »
As machine learning models are increasingly employed in medicine, researchers, healthcare organizations, providers, and patients have all emphasized the need for greater transparency. To provide explanations of models in high-stakes applications, two broad strategies have been outlined in prior literature. Post hoc explanation methods explain the behaviour of complex black-box models by highlighting image regions critical to model predictions; however, prior work has shown that these explanations may not be faithful, and even more concerning is our inability to verify them. Specifically, it is nontrivial to evaluate if a given feature attribution is correct with respect to the underlying model. Inherently interpretable models, on the other hand, circumvent this by explicitly encoding explanations into model architecture, making their explanations naturally faithful and verifiable, but they often exhibit poor predictive performance due to their limited expressive power. In this work, we aim to bridge the gap between the aforementioned strategies by proposing Verifiability Tuning (VerT), a method that transforms black-box models into models with verifiable feature attributions. We begin by introducing a formal theoretical framework to understand verifiability and show that attributions produced by standard models cannot be verified. We then leverage this framework to propose a method for building verifiable models and feature attributions from black-box models. Finally, we perform extensive experiments on semi-synthetic and real-world datasets, and show that VerT produces models (1) yield explanations that are correct and verifiable and (2) are faithful to the original black-box models they are meant to explain. |
Usha Bhalla · Suraj Srinivas · Himabindu Lakkaraju 🔗 |
-
|
Mask, Stitch, and Re-Sample: Enhancing Robustness and Generalizability in Anomaly Detection through Automatic Diffusion Models
(
Poster
)
link »
The introduction of diffusion models in anomaly detection has paved the way for more effective and accurate image reconstruction in pathologies. However, the current limitations in controlling noise granularity hinder the ability of diffusion models to generalize across diverse anomaly types and compromise the restoration of healthy tissues. To overcome these challenges, we propose AutoDDPM, a novel approach that enhances the robustness of diffusion models. AutoDDPM utilizes diffusion models to generate initial likelihood maps of potential anomalies and seamlessly integrates them with the original image. Through joint noised distribution re-sampling, AutoDDPM achieves harmonization and in-painting effects. Our study demonstrates the efficacy of AutoDDPM in replacing anomalous regions while preserving healthy tissues, considerably surpassing diffusion models' limitations. It also contributes valuable insights and analysis on the limitations of current diffusion models, promoting robust and interpretable anomaly detection in medical imaging — an essential aspect of building autonomous clinical decision systems with higher interpretability. |
Cosmin Bercea · Michael Neumayr · Daniel Rueckert · Julia Schnabel 🔗 |
-
|
Adverse event prediction using a task-specific generative model
(
Poster
)
link »
Longitudinal data analysis is essential in various fields, providing insights into associations between interpretable explanatory variables and temporal response variables. Recent progress in generative modelling has demonstrated models that can learn low-dimensional representations of complex longitudinal data and capture intricate interactions between high-dimensional features. Ideally, the trained generative model can be used for various downstream tasks, such as data generation, prediction and classification. In this work, we evaluate the performance of the longitudinal variational autoencoder model in predicting adverse events in clinical trials. We also propose a general training approach that can learn versatile generative models while simultaneously optimising performance on a specific downstream task. Our experiments on two simulated datasets and one clinical trial dataset demonstrate that the proposed training objective provides results that are either comparable or better than results obtained with the standard training methods. Our results also suggest that longitudinal information is useful for adverse event prediction in clinical trials. |
Otto Lönnroth · Siddharth Ramchandran · Pekka Tiikkainen · Mine Öğretir · Jussi Leinonen · Harri Lähdesmäki 🔗 |
-
|
Interpreting Differentiable Latent States for Healthcare Time-series Data
(
Poster
)
link »
Machine learning enables extracting clinical insights from large temporal datasets. The applications of such machine learning models include identifying disease patterns and predicting patient outcomes. However, limited interpretability poses challenges for deploying advanced machine learning in digital healthcare. Understanding the meaning of latent states is crucial for interpreting machine learning models, assuming they capture underlying patterns. In this paper, we present a concise algorithm that allows for i) interpreting latent states using highly related input features; ii) interpreting predictions using subsets of input features via latent states; and iii) interpreting changes in latent states over time. The proposed algorithm is feasible for any model that is differentiable. We demonstrate that this approach enables the identification of a daytime behavioral pattern for predicting nocturnal behavior in a real-world healthcare dataset. |
Yu Chen · Nivedita Bijlani · Samaneh Kouchaki · Payam Barnaghi 🔗 |
-
|
Self-verification improves few-shot clinical information extraction
(
Poster
)
link »
Extracting patient information from unstructured text is a critical task in health decision-support and clinical research. Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning, in contrast to supervised learning, which requires costly human annotations. However, despite drastic advances, modern LLMs such as GPT-4 still struggle with issues regarding accuracy and interpretability, especially in safety-critical domains such as health. We explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs. This framework is made possible by the asymmetry between verification and generation, where the former is often much easier than the latter. Experimental results show that our method consistently improves accuracy for various LLMs across standard clinical information extraction tasks. Additionally, self-verification yields interpretations in the form of a short text span corresponding to each output, which makes it efficient for human experts to audit the results, paving the way towards trustworthy extraction of clinical information in resource-constrained scenarios. To facilitate future research in this direction, we release our code and prompts. |
Zelalem Gero · Chandan Singh · Hao Cheng · Tristan Naumann · Michel Galley · Jianfeng Gao · Hoifung Poon 🔗 |
-
|
Interpreting deep embeddings for disease progression clustering
(
Poster
)
link »
We propose a novel approach for interpreting deep embeddings in the context of patient clustering. We evaluate our approach on a dataset of participants with type 2 diabetes from the UK Biobank, and demonstrate clinically meaningful insights into disease progression patterns. |
Anna Munoz-Farre · Antonios Poulakakis-Daktylidis · Dilini Kothalawala · Andrea Rodriguez-Martinez 🔗 |
-
|
Explainable Deep Learning for Disease Activity Prediction in Chronic Inflammatory Joint Diseases
(
Poster
)
link »
Analysing complex diseases such as chronic inflammatory joint diseases, where many factors influence the disease evolution, is a challenging task.We propose an explainable attention-based neural network model trained on data from patients with different arthritis subtypes for predicting future disease activity scores. The network transforms longitudinal patient journeys into comparable representations allowing for additional case-based explanations via computed patient journey similarities. We show how these similarities allow us to rank different patient characteristics in terms of impact on disease progression and discuss how case-based explanations can enhance the transparency of deep learning solutions. |
Cécile Trottet · Ahmed Allam · Raphael Micheroli · Aron Horvath · Michael Krauthammer · Caroline Ospelt 🔗 |
-
|
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
(
Poster
)
link »
This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of ourknowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources. |
YongKang He · Mingjin Chen · Yongyi Lu · Zhijing Yang 🔗 |
-
|
Risk-adjusted Training and Evaluation for Medical Object Detection in Breast Cancer MRI
(
Poster
)
link »
Medical object detection revolves around discovering and rating lesions and other objects, with the most common way of measuring performance being FROC (Free-response Receiver Operating Characteristic), which calculates sensitivity at predefined thresholds of false positives per case. However, in a diagnosis or screening setting not all lesions are equally important, because small indeterminate lesions have limited clinical significance, while failing to detect and correctly classify high risk lesions can potentially hinder clinical prognosis and treatment options. It is therefore cardinal to correctly account for this risk imbalance in the way machine learning models are developed and evaluated. In this work, we propose risk-adjusted FROC (raFROC), an adaptation of FROC that constitutes a first step on reflecting the underlying clinical need more accurately. Experiments on two different breast cancer datasets with a total of 1535 lesions in 1735 subjects showcase the clinical relevance of the proposed metric and its advantages over traditional evaluation methods. Additionally, by utilizing a risk-adjusted adaptation of focal loss (raFocal) we are able to improve the raFROC results and patient-level performance of nnDetection, a state-of-the-art medical object detection framework, at no expense of the regular FROC. |
Dimitrios Bounias · Michael Baumgartner · Peter Neher · Balint Kovacs · Ralf Floca · Paul F. Jaeger · Lorenz Kapsner · Jessica Eberle · Dominique Hadler · Frederik Laun · Sabine Ohlmeyer · Klaus Maier-Hein · Sebastian Bickelhaupt
|
-
|
Curve your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models
(
Poster
)
link »
Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity, particularly in high-stakes domains such as healthcare. GAMs are favored due to their interpretability, which arises from expressing the target value as a sum of non-linear functions of the predictors. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the predictors - has hitherto been largely overlooked. Here, we demonstrate how concurvity can severly impair the interpretability of GAMs and propose a remedy: a conceptually simple, yet effective regularizer which penalizes pairwise correlations of the non-linearly transformed feature variables. This procedure is applicable to any gradient-based fitting of differentiable additive models, such as Neural Additive Models or NeuralProphet, and enhances interpretability by eliminating ambiguities due to self-canceling feature contributions. We validate the effectiveness of our regularizer in experiments on synthetic as well as real-world datasets for time-series and tabular data. Our experiments show that concurvity in GAMs can be reduced without significantly compromising prediction quality, improving interpretability and reducing variance in the feature importances. |
Julien Siems · Konstantin Ditschuneit · Winfried Ripken · Alma Lindborg · Maximilian Schambach · Johannes Otterbach · Martin Genzel 🔗 |
-
|
Learning replacement variables in interpretable rule-based models
(
Poster
)
link »
Rule models are favored in many prediction tasks due to their interpretation using natural language and their simple presentation. When learned from data, they can provide high predictive performance, on par with more complex models. However, in the presence of incomplete input data during test time, standard rule models’ predictions are undefined or ambiguous.In this work, we consider learning compact yet accurate rule models with missing values at both training and test time, based on the notion of replacement variables. We propose a method called MINTY which learns rules in the form of disjunctions between variables that act as replacements for each other when one or more is missing. This results in a sparse linear rule model that naturally allows a trade-off between interpretability and goodness of fit while being sensitive to missing values at test time. We demonstrate the concept of MINTY in preliminary experiments and compare the predictive performance to baselines. |
Lena Stempfle · Fredrik Johansson 🔗 |
-
|
Interpretable Alzheimer’s Disease Classification Via a Contrastive Diffusion Autoencoder.
(
Poster
)
link »
In visual object classification, humans often justify their choices by comparing objects to prototypical examples within that class. We may therefore increase the interpretability of deep learning models by imbuing them with a similar style of reasoning. In this work, we apply this principle by classifying Alzheimer’s Disease based on the similarity of images to training examples within the latent space. We use a contrastive loss combined with a diffusion autoencoder backbone, to produce a semantically meaningful latent space, such that neighbouring latents have similar image-level features. We achieve a classification accuracy comparable to black box approaches on a dataset of 2D MRI images, whilst producing human interpretable model explanations. Therefore, this work stands as a contribution to the pertinent development of accurate and interpretable deep learning within medical imaging. |
Ayodeji Ijishakin · Ahmed Abdulaal · Adamos Hadjivasiliou · Sophie Martin · James Cole 🔗 |
-
|
SepVAE: a contrastive VAE to separate pathological patterns from healthy ones.
(
Poster
)
link »
Contrastive Analysis VAEs (CA-VAEs) are a family of Variational auto-encoders (VAEs) that aims at separating the common factors of variation between a background dataset (BG) (i.e., healthy subjects) and a target dataset (TG) (i.e., patients) from the ones that only exist in the target dataset. To do so, these methods separate the latent space into a set of salient features (i.e., proper to the target dataset) and a set of common features (i.e., exist in both datasets). Currently, all models fail to effectively prevent the sharing of information between latent spaces and to capture all salient factors of variation.To this end, we introduce two crucial regularization losses: a disentangling term between common and salient representations and a classification term between background and target samples in the salient space. We show a better performance than previous CA-VAEs methods on three medical applications and a natural images dataset (CelebA). Code and datasets are available at https://anonymous.4open.science/r/sep_vae-0D94/. |
Robin Louiset · Edouard Duchesnay · Antoine Grigis · Benoit Dufumier · Pietro Gori 🔗 |
-
|
Better Calibration Error Estimation for Reliable Uncertainty Quantification
(
Poster
)
link »
Reliable uncertainty quantification is crucial in high-stakes applications, such as healthcare. The $\text{ECE}_{EW}$ has been the most commonly used estimator to quantify the calibration error (CE), but it is heavily biased and can significantly underestimate the true calibration error. While alternative estimators, such as $\text{ECE}_\text{DEBIASED}$ and $\text{ECE}_\text{SWEEP}$, achieve smaller estimation bias in comparison, they exhibit a trade-off between overestimation of the CE on uncalibrated models and underestimation on recalibrated models. To address this trade-off, we propose a new estimator based on K-Nearest Neighbors (KNN), called $\text{ECE}_\text{KNN}$, which constructs representative overlapping local neighbourhoods for improved CE estimation. Empirical evaluation results demonstrate that $\text{ECE}_\text{KNN}$ simultaneously achieves near-zero underestimation of the CE on uncalibrated models while also achieving lower degrees of overestimation on recalibrated models.
|
Shuman Peng · Parsa Alamzadeh · Martin Ester 🔗 |
-
|
Generating Global Factual and Counterfactual Explainer for Molecule under Domain Constraints
(
Poster
)
link »
Graph neural networks (GNNs) are powerful tools for handling graph-structured data but often lack transparency. This paper aims to generate interpretable global explanations for GNN predictions, focusing on real-world scenarios like chemical molecules. We develop an approach that produces both factual and counterfactual explanations while incorporating domain constraints, ensuring validity and interpretability for domain experts. Our contributions include creating global explanations, integrating domain constraints, and improving random walk in global explanations using fragment-based editing. We demonstrate the effectiveness of our approach on AIDS and Mutagenicity datasets, providing a comprehensive understanding of GNNs and aiding domain experts in evaluating generated explanations. |
Danqing Wang · Antonis Antoniades · Ambuj Singh · Lei Li 🔗 |
-
|
TabCBM: Concept-based Interpretable Neural Networks for Tabular Data
(
Poster
)
link »
Concept-based interpretability addresses a deep neural network's opacity by constructing explanations for its predictions using high-level units of information referred to as concepts. Research in this area, however, has been mainly focused on image and graph-structured data, leaving high-stakes medical and genomic tasks whose data is tabular out of reach of existing methods. In this paper, we address this gap by introducing the first definition of what a high-level concept may entail in tabular data. We use this definition to propose Tabular Concept Bottleneck Models (TabCBMs), a family of interpretable self-explaining neural architectures capable of learning high-level concept explanations for tabular tasks without concept annotations. We evaluate our method in synthetic and real-world tabular tasks and show that it outperforms or performs competitively against state-of-the-art methods while providing a high level of interpretability as measured by its ability to discover known high-level concepts. Finally, we show that TabCBM can discover important high-level concepts in synthetic datasets inspired by critical tabular tasks (e.g., single-cell RNAseq) and allows for human-in-the-loop concept interventions in which an expert can correct mispredicted concepts to boost the model's performance. |
Mateo Espinosa Zarlenga · Zohreh Shams · Michael Nelson · Been Kim · Mateja Jamnik 🔗 |
-
|
Continuous Time Evidential Distributions for Irregular Time Series
(
Poster
)
link »
Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could take on a range of values depending on when it was last observed. To characterize this uncertainty we present EDICT, a strategy that learns an evidential distribution over irregular time series in continuous time. This distribution enables well-calibrated and flexible inference of partially observed features at any time of interest, while expanding uncertainty temporally for sparse, irregular observations. We demonstrate that EDICT attains competitive performance on challenging time series classification tasks and enabling uncertainty-guided inference when encountering noisy data. |
Taylor Killian · Haoran Zhang · Thomas Hartvigsen · Ava Amini 🔗 |
-
|
Consistent Explanations in the Face of Model Indeterminacy
(
Poster
)
link »
This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task. Despite their similar performance, such models often exhibit inconsistent or even contradictory explanations for their predictions, posing challenges to end users who rely on them to make critical decisions. Recognizing this, we introduce ensemble methods as an approach to enhance the consistency of the explanations provided in these scenarios. Leveraging insights from recent work on neural network loss landscapes and mode connectivity, we devise ensemble strategies to efficiently explore the underspecification set- the set of models with performance variations resulting solely from changes in the random seed during training. Experiments on five benchmark financial datasets reveal that ensembling can yield significant improvements when it comes to explanation similarity, and demonstrate the potential of existing ensemble methods to explore the underspecification set efficiently. Our findings highlight the importance of considering model indeterminacy when interpreting explanations and showcase the effectiveness of ensembles in enhancing the reliability of explanations in machine learning. |
Dan Ley · Leonard Tang · Matthew Nazari · Hongjin Lin · Suraj Srinivas · Himabindu Lakkaraju 🔗 |
-
|
Eye-tracking of clinician behaviour with explainable AI decision support: a high-fidelity simulation study
(
Poster
)
link »
Explainable AI (XAI) is seen as important for AI-driven clinical decision support tools but most XAI has been evaluated on non-expert populations for proxy tasks and in low-fidelity settings. The rise of generative AI and the potential safety risk of hallucinatory AI suggestions causing patient harm has once again highlighted the question of whether XAI can act as a safety mitigation mechanism. We studied intensive care doctors in a high-fidelity simulation suite with eye-tracking glasses on a prescription dosing task to better understand their interaction dynamics with XAI (for both intentionally safe and unsafe (i.e. hallucinatory) AI suggestions). We show that it is feasible to perform eye-tracking and that the attention devoted to any of 4 types of XAI does not differ between safe and unsafe AI suggestions. This calls into question the utility of XAI as a mitigation against patient harm from clinicians erroneously following poor quality AI advice. |
Myura Nagendran · Paul Festor · Matthieu Komorowski · Anthony Gordon · Aldo Faisal 🔗 |
-
|
Eye-tracking of clinician behaviour with explainable AI decision support: a high-fidelity simulation study
(
Oral
)
link »
Explainable AI (XAI) is seen as important for AI-driven clinical decision support tools but most XAI has been evaluated on non-expert populations for proxy tasks and in low-fidelity settings. The rise of generative AI and the potential safety risk of hallucinatory AI suggestions causing patient harm has once again highlighted the question of whether XAI can act as a safety mitigation mechanism. We studied intensive care doctors in a high-fidelity simulation suite with eye-tracking glasses on a prescription dosing task to better understand their interaction dynamics with XAI (for both intentionally safe and unsafe (i.e. hallucinatory) AI suggestions). We show that it is feasible to perform eye-tracking and that the attention devoted to any of 4 types of XAI does not differ between safe and unsafe AI suggestions. This calls into question the utility of XAI as a mitigation against patient harm from clinicians erroneously following poor quality AI advice. |
Myura Nagendran · Paul Festor · Matthieu Komorowski · Anthony Gordon · Aldo Faisal 🔗 |
-
|
Echocardiographic Clustering by Machine Learning in Children with Early Surgically Corrected Congenital Heart Disease
(
Poster
)
link »
The research investigates the time-series clustering from echocardiography in children with surgically corrected congenital heart disease (CHD). In recent years, machine learning has been demonstrated to discover sophisticated latent patterns in medical data, yet relevant explainable applications in pediatric cardiology remain lacking. To address this issue, we propose an autoencoder-based architecture to model time-series data with interpretable outcomes effectively. The proposed method outperforms the baseline models in terms of internal clustering metrics. The three clusters also show distinguished differences in patients' outcomes. The data mining result can potentially facilitate clinicians to stratify patients' prognoses based on echocardiographic and clinical observations in the future. |
Wei-Hsuan Chien · Cristian Rodriguez Rivero · Stijn Haas · Mitchel Molenaar 🔗 |
-
|
Echocardiographic Clustering by Machine Learning in Children with Early Surgically Corrected Congenital Heart Disease
(
Oral
)
link »
The research investigates the time-series clustering from echocardiography in children with surgically corrected congenital heart disease (CHD). In recent years, machine learning has been demonstrated to discover sophisticated latent patterns in medical data, yet relevant explainable applications in pediatric cardiology remain lacking. To address this issue, we propose an autoencoder-based architecture to model time-series data with interpretable outcomes effectively. The proposed method outperforms the baseline models in terms of internal clustering metrics. The three clusters also show distinguished differences in patients' outcomes. The data mining result can potentially facilitate clinicians to stratify patients' prognoses based on echocardiographic and clinical observations in the future. |
Wei-Hsuan Chien · Cristian Rodriguez Rivero · Stijn Haas · Mitchel Molenaar 🔗 |
-
|
Auditing for Human Expertise
(
Poster
)
link »
High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human experts {\em add value} which could not be captured by an algorithmic predictor. In this work, we develop a statistical framework under which we can pose this question as a natural hypothesis test. We highlight the utility of our procedure using admissions data collected from the emergency department of a large academic hospital system, where we show that physicians' admit/discharge decisions for patients with acute gastrointestinal bleeding (AGIB) appear to be incorporating information not captured in a standard algorithmic screening tool. This is despite the fact that the screening tool is arguably more accurate than physicians' discretionary decisions, highlighting that -- even absent normative concerns about accountability or interpretability -- accuracy is insufficient to justify algorithmic automation. |
Rohan Alur · Loren Laine · Darrick Li · Manish Raghavan · Devavrat Shah · Dennis Shung 🔗 |
-
|
Generating Explanations to understand Fatigue in Runners Using Time Series Data from Wearable Sensors
(
Poster
)
link »
Running while fatigued poses an increased risk of injury. Wearable sensors can be used to capture the running kinematics or running pattern as time series signals. The changes that happen in the running pattern due to fatigue, although prominent enough to increase the risk of injury, are generally only seen as subtle differences in the signal itself and hence are difficult to differentiate using purely visual inspection. In this paper, we introduce a time series dataset of motion capture data from runners before and after a fatiguing intervention. The total dataset consists of more than 5500 instances and was collected from 19 participants. The evaluation presented in this paper first looks at the effectiveness of a data aggregation technique called time series barycenters which is shown to improve classification performance. We evaluate and compare a set of classifiers and explanation methods for this problem, and select the most informative classifier and explanation for this dataset. We then present feedback from a domain expert on the insights offered by the the explanations. |
Bahavathy Kathirgamanathan · Padraig Cunningham 🔗 |
-
|
ProtoGate: Prototype-based Neural Networks with Local Feature Selection for Tabular Biomedical Data
(
Poster
)
link »
Tabular biomedical data poses challenges in machine learning because it is often high-dimensional and typically low-sample-size. Previous research has attempted to address these challenges via feature selection approaches, which can lead to unstable performance and insufficient interpretability on real-world data. This suggests that current methods lack appropriate inductive biases that capture informative patterns in different samples. In this paper, we propose ProtoGate, a local feature selection method that introduces an inductive bias by attending to the clustering characteristic of biomedical data. ProtoGate selects features in a global-to-local manner and leverages them to produce explainable predictions via an interpretable prototype-based model. We conduct comprehensive experiments to evaluate the performance of ProtoGate on synthetic and real-world datasets. Our results show that exploiting the homogeneous and heterogeneous patterns in the data can improve prediction accuracy while prototypes imbue interpretability. |
Xiangjian Jiang · Andrei Margeloiu · Nikola Simidjievski · Mateja Jamnik 🔗 |
-
|
Understanding the Size of the Feature Importance Disagreement Problem in Real-World Data
(
Poster
)
link »
Feature importance can be used to gain insight in prediction models. However, different feature importance methods might result in different generated explanations, which has recently been coined as the explanation disagreement problem. Little is known about the size of the disagreement problem in real-world data. Such disagreements are harmful in practice as conflicting explanations only make prediction models less transparent to endusers, which contradicts the main goal of these methods. Hence, it is important to empirically analyze and understand the feature importance disagreement problem in real-world data. We present a novel evaluation framework to measure the influence of different elements of data complexity on the size of the disagreement problem by modifying real-world data. We investigate the feature importance disagreement problem in two datasets from the Dutch general practitioners database IPCI and two open-source datasets. |
Aniek Markus · Egill Fridgeirsson · Jan Kors · Katia Verhamme · Jenna Reps · Peter Rijnbeek 🔗 |
-
|
Is Task-Agnostic Explainable AI a Myth?
(
Poster
)
link »
Our work serves as a framework for unifying the challenges of contemporary explainable AI (XAI). We demonstrate that while XAI methods provide supplementary and potentially useful output for machine learning models, researchers and decision-makers should be mindful of their conceptual and technical limitations, which frequently result in these methods themselves becoming black boxes. We examine three XAI research avenues spanning image, textual, and graph data, covering saliency, attention, and graph-type explainers. Despite the varying contexts and timeframes of the mentioned cases, the same persistent roadblocks emerge, highlighting the need for a conceptual breakthrough in the field to address the challenge of compatibility between XAI methods and application tasks. |
Alicja Chaszczewicz 🔗 |
-
|
Is Task-Agnostic Explainable AI a Myth?
(
Oral
)
link »
Our work serves as a framework for unifying the challenges of contemporary explainable AI (XAI). We demonstrate that while XAI methods provide supplementary and potentially useful output for machine learning models, researchers and decision-makers should be mindful of their conceptual and technical limitations, which frequently result in these methods themselves becoming black boxes. We examine three XAI research avenues spanning image, textual, and graph data, covering saliency, attention, and graph-type explainers. Despite the varying contexts and timeframes of the mentioned cases, the same persistent roadblocks emerge, highlighting the need for a conceptual breakthrough in the field to address the challenge of compatibility between XAI methods and application tasks. |
Alicja Chaszczewicz 🔗 |
-
|
Robust Ranking Explanations
(
Poster
)
link »
Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the R2ET algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.
|
Chao Chen · Chenghua Guo · Guixiang Ma · Ming Zeng · Xi Zhang · Sihong Xie 🔗 |
-
|
Participatory Personalization in Classification
(
Poster
)
link »
Machine learning models are often personalized based on information that is protected, sensitive, self-reported, or costly to acquire. These models use information about people, but do not facilitate nor inform their \emph{consent}. Individuals cannot opt out of reporting information that a model needs to personalize their predictions nor tell if they would benefit from personalization in the first place. We introduce a new family of prediction models, called participatory systems, that let individuals opt into personalization at prediction time. We present a model-agnostic algorithm to learn participatory systems for supervised learning tasks where models are personalized with categorical group attributes. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, comparing them to common approaches for personalization and imputation. Experimental results demonstrate that participatory systems can facilitate and inform consent in a way that improves performance and privacy across all groups who report personal data. |
Hailey Joren · Chirag Nagpal · Katherine Heller · Berk Ustun 🔗 |
-
|
GraphChef: Learning the Recipe of Your Dataset
(
Poster
)
link »
We propose a new graph model, GraphChef, that enables us to understand graph datasets as a whole. Given a dataset, GraphChef returns a set of rules (a recipe) that describes each class in the dataset. Existing GNNs and explanation methods reason on individual graphs not on the entire dataset. GraphChef uses decision trees to build recipes that are understandable by humans. We show how to compute decision trees in the message passing framework in order to create GraphChef. We also present a new pruning method to produce small and easy to digest trees. In the experiments, we present and analyze GraphChef's recipes for Reddit-Binary, MUTAG, BA-2Motifs, BA-Shapes, Tree-Cycle, and Tree-Grid. We verify the correctness of the discovered recipes against the datasets' ground truth. |
Peter Müller · Lukas Faber · Karolis Martinkus · Roger Wattenhofer 🔗 |
-
|
GraphChef: Learning the Recipe of Your Dataset
(
Oral
)
link »
We propose a new graph model, GraphChef, that enables us to understand graph datasets as a whole. Given a dataset, GraphChef returns a set of rules (a recipe) that describes each class in the dataset. Existing GNNs and explanation methods reason on individual graphs not on the entire dataset. GraphChef uses decision trees to build recipes that are understandable by humans. We show how to compute decision trees in the message passing framework in order to create GraphChef. We also present a new pruning method to produce small and easy to digest trees. In the experiments, we present and analyze GraphChef's recipes for Reddit-Binary, MUTAG, BA-2Motifs, BA-Shapes, Tree-Cycle, and Tree-Grid. We verify the correctness of the discovered recipes against the datasets' ground truth. |
Peter Müller · Lukas Faber · Karolis Martinkus · Roger Wattenhofer 🔗 |
-
|
Longitudinal Variational Autoencoder for Compositional Data Analysis
(
Poster
)
link »
The analysis of compositional longitudinal data, particularly in microbiome time-series, is a challenging task due to its high-dimensional, sparse, and compositional nature. In this paper, we introduce a novel Gaussian process (GP) prior variational autoencoder for longitudinal data analysis with a multinomial likelihood (MNLVAE) that is specifically designed for compositional time-series analysis. Our generative deep learning model captures complex interactions among microbial taxa while accounting for the compositional structure of the data. We utilize centered log-ratio (CLR) and isometric log-ratio (ILR) transformations to preprocess and transform compositional count data, and utilize a latent multi-output additive GP model to enable prediction of future observations. Our experiments demonstrate that MNLVAE outperforms competing method, offering improved prediction performance across different longitudinal microbiome datasets. |
Mine Öğretir · Harri Lähdesmäki · Jamie Norton 🔗 |
-
|
Deep Learning Approach for Cardiac Electrophysiology Model Correction
(
Poster
)
link »
Imaging the electrical activity of the heart can be achieved with invasive catheterisation. However, the resulting data are sparse and noisy. Mathematical modelling of cardiac electrophysiology can help the analysis but solving the associated mathematical systems can become unfeasible. It is often computationally demanding, for instance when solving for different patient conditions. We present a new framework to model the dynamics of cardiac electrophysiology at lower cost. It is based on the integration of a low-fidelity physical model and a learning component implemented here via neural networks. The latter acts as a complement to the physical part, and handles all quantities and dynamics that the simplified physical model neglects. We demonstrate that this framework allows us to reproduce the complex dynamics of the transmembrane potential and to correctly identify the relevant physical parameters, even when only partial measurements are available. This combined model-based and data-driven approach could improve cardiac electrophysiological imaging and provide predictive tools. |
Victoriya Kashtanova · Mihaela Pop · patrick gallinari · Maxime Sermesant 🔗 |
-
|
Efficient Estimation of Local Robustness of Machine Learning Models
(
Poster
)
link »
Machine learning models often need to be robust to noisy input data. The effect of real-world noise (which is often random) on model predictions is captured by a model’s local robustness, i.e., the consistency of model predictions in a local region around an input. Local robustness is therefore an important characterization of real-world model behavior and can be useful for debugging models and establishing user trust. However, the naïve approach to computing local robustness based on Monte-Carlo sampling is statistically inefficient, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute local robustness of multi-class discriminative models using local linear function approximation and the multivariate Normal CDF. Through the derivation of these estimators, we show how local robustness is connected to concepts such as randomized smoothing and softmax probability. We also confirm empirically that these estimators accurately and efficiently compute the local robustness of standard deep learning models. In addition, we demonstrate these estimators’ usefulness for various tasks involving local robustness, such as measuring robustness bias and identifying examples that are vulnerable to noise perturbation in a dataset. y developing analytical estimators of local robustness, but also makes its computation practical, enabling the use of local robustness in critical downstream applications. |
Tessa Han · Suraj Srinivas · Himabindu Lakkaraju 🔗 |
-
|
What Works in Chest X-Ray Classification? A Case Study of Design Choices
(
Poster
)
link »
Public competitions and datasets have yielded increasingly accurate chest x-ray prediction models. The best such models now match even human radiologists on benchmarks. These models go beyond "standard" image classification techniques, and instead employ design choices specialized for the chest x-ray domain. However, as a result, each model ends up using a different, non-standardized training setup, making it unclear how individual design choices---be it the choice of model architecture, data augmentation type, or loss function---actually affect performance. So, which design choices should we use in practice? Examining a wide range of model design choices on three canonical chest x-ray benchmarks, we find that by simply leveraging a (properly tuned) model composed of up standard image classification design choices, one can often match the performance of even the best domain-specific models. Moreover, starting from a "barebones," generic ResNet-50 with cross-entropy loss and no data augmentation, we discover that none of the proposed design choices---including broadly used choices like the DenseNet-121 architecture or basic data augmentation---consistently improve performance over that generic learning setup. |
Evan Vogelbaum · Logan Engstrom · Aleksander Madry 🔗 |
-
|
Towards Interpretable Classification of Leukocytes based on Deep Learning
(
Poster
)
link »
Label-free approaches are attractive in cytological imaging due to their flexibility and cost efficiency. They are supported by machine learning methods, which, despite the lack of labeling and the associated lower contrast, can classify cells with high accuracy where the human observer has a little chance to discriminate cells. In order to better integrate these workflows into the clinical decision making process, this work investigates the calibration of confidence estimation for the automated classification of leukocytes. In addition, different visual explanation approaches are compared, which should bring machine decision making closer to professional healthcare applications. Furthermore, we were able to identify general detection patterns in neural networks and demonstrate the utility of the presented approaches in different scenarios of blood cell analysis. |
Stefan Röhrl · Johannes Groll · Manuel Lengl · Simon Schumann · Christian Klenk · Dominik Heim · Martin Knopp · Oliver Hayden · Klaus Diepold 🔗 |
-
|
Why Deep Models Often Cannot Beat Non-deep Counterparts on Molecular Property Prediction?
(
Poster
)
link »
Molecular property prediction is a crucial task in the AI-driven Drug Discovery (AIDD) pipeline, which has recently gained considerable attention thanks to advances in deep learning. However, recent research has revealed that deep models struggle to beat traditional non-deep ones on MPP. In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 14 molecule datasets. Through the most comprehensive study to date, we make the following key observations: \textbf{(\romannumeral 1)} Deep models are generally unable to outperform non-deep ones; \textbf{(\romannumeral 2)} The failure of deep models on MPP cannot be solely attributed to the small size of molecular datasets. What matters is the irregular molecule data pattern; \textbf{(\romannumeral 3)} In particular, tree models using molecular fingerprints as inputs tend to perform better than other competitors.Furthermore, we conduct extensive empirical investigations into the unique patterns of molecule data and inductive biases of various models underlying these phenomena. |
Jun Xia · Lecheng Zhang · Xiao Zhu · Stan Z Li 🔗 |
-
|
Transfer Causal Learning: Causal Effect Estimation with Knowledge Transfer
(
Poster
)
link »
A novel problem of improving causal effect estimation accuracy with the help of knowledge transfer under the same covariate (or feature) space setting, i.e., homogeneous transfer learning (TL), is studied, referred to as the Transfer Causal Learning (TCL) problem. While most recent efforts in adapting TL techniques to estimate average causal effect (ACE) have been focused on the heterogeneous covariate space setting, those methods are inadequate for tackling the TCL problem since their algorithm designs are based on the decomposition into shared and domain-specific covariate spaces. To address this issue, we propose a generic framework called \texttt{$\ell_1$-TCL}, which incorporates $\ell_1$ regularized TL for nuisance parameter estimation and downstream plug-in ACE estimators, including outcome regression, inverse probability weighted, and doubly robust estimators. Most importantly, with the help of Lasso for high-dimensional regression, we establish non-asymptotic recovery guarantees for the generalized linear model (GLM) under the sparsity assumption for the proposed \texttt{$\ell_1$-TCL}. From an empirical perspective, \texttt{$\ell_1$-TCL} is a generic learning framework that can incorporate not only GLM but also many recently developed non-parametric methods, which can enhance robustness to model mis-specification. We demonstrate this empirical benefit through extensive numerical simulation by incorporating both GLM and recent neural network-based approaches in \texttt{$\ell_1$-TCL}, which shows improved performance compared with existing TL approaches for ACE estimation. Furthermore, our \texttt{$\ell_1$-TCL} framework is subsequently applied to a real study, revealing that vasopressor therapy could prevent 28-day mortality within septic patients, which all baseline approaches fail to show.
|
Song Wei · Ronald Moore · Hanyu Zhang · Yao Xie · Rishikesan Kamaleswaran 🔗 |
-
|
Discovering Mental Health Research Topics with Topic Modeling
(
Poster
)
link »
Mental health significantly influences various aspects of our daily lives, and its importance has been increasingly recognized by the research community and the general public, particularly in the wake of the COVID-19 pandemic. This heightened interest is evident in the growing number of publications dedicated to mental health in the past decade. In this study, our goal is to identify general trends in the field and pinpoint high-impact research topics by analyzing a large dataset of mental health research papers.To accomplish this, we collected abstracts from various databases and leveraged a learned Sentence-BERT based embedding model to analyze the evolution of topics over time. Our dataset comprises 96,676 research papers pertaining to mental health, enabling us to examine the relationships between different topics using their abstracts. To evaluate the effectiveness of our proposed model, we compared it against two other state-of-the-art methods: Top2Vec and LDA-BERT model. Our model demonstrated superior performance in metrics such as TD Inv. RBO (Inverse Rank-Biased Overlap) and TC Cv (Coefficient of Topic Coherence). To enhance our analysis, we also generated word clouds to provide a comprehensive overview of the machine learning models applied in mental health research, shedding light on commonly utilized techniques and emerging trends. Furthermore, we provide a GitHub link to the dataset used in this paper, ensuring its accessibility for further research endeavors. |
Xin Gao · Cem Sazara 🔗 |
-
|
Generalizing Neural Additive Models via Statistical Multimodal Analysis
(
Poster
)
link »
Generalized Additive Models (GAM) and Neural Additive Models (NAM) have gained a lot of attention for addressing trade-offs between accuracy and interpretability of machine learning models. Although the field has focused on minimizing trade-offs between accuracy and interpretability, the limitation of GAM or NAM on data that has multiple subpopulations, differentiated by latent variables with distinctive relationships between features and outputs, has rarely been addressed. The main reason behind this limitation is that these models collapse multiple relationships by being forced to fit the data in a unimodal fashion. Here, we address and describe the overlooked limitation of "one-fits-all" interpretable methods and propose a Mixture of Neural Additive Models (MNAM) to overcome it. The proposed MNAM learns relationships between features and outputs in a multimodal fashion and assigns a probability to each mode. Based on a subpopulation, MNAM will activate one or more matching modes by increasing their probability. Thus, the objective of MNAM is to learn multiple relationships and activate the right relationships by automatically identifying subpopulations of interest. Similar to how GAM and NAM have fixed relationships between features and outputs, MNAM will maintain interpretability by having multiple fixed relationships. We demonstrate how the proposed MNAM balances between rich representations and interpretability with numerous empirical observations and pedagogical studies. The code is available at (to be completed upon acceptance). |
Young Kyung Kim · Juan Di Martino · Guillermo Sapiro 🔗 |
-
|
Unsupervised Discovery of Steerable Factors in Graphsc
(
Poster
)
link »
Deep generative models have been widely developed for graph data such as molecular graphs and point clouds. Yet, much less investigation has been carried out on understanding the learned latent space of deep graph generative models. Such understandings can open up a unified perspective and provide guidelines for essential tasks like controllable generation. To this end, this work develops a method called GraphCG for the unsupervised discovery of steerable factors in the latent space of deep graph generative models; GraphCG is able to steer key graph factors such as functional groups modification of molecules and engine updates in airplane point clouds. We first examine the representation space of the DGMs trained for graph and observe that the learned representation space is not perfectly disentangled. Based on this observation, GraphCG learns the semantic-rich directions via maximizing the corresponding mutual information, where the edited graph along the same direction will share certain steerable factors. We conduct experiments on two types of graph data, molecular graphs and point clouds. Both the quantitative and qualitative results show the effectiveness of GraphCG in discovering steerable factors. |
Shengchao Liu · Chengpeng Wang · Weili Nie · Hanchen Wang · Jiarui Lu · Bolei Zhou · Jian Tang 🔗 |
Author Information
Weina Jin (Simon Fraser University)
Ramin Zabih (Cornell University)
S. Kevin Zhou (Institute of Computing Technology, Chinese Academy of Sciences)
Yuyin Zhou (Johns Hopkins University)
Xiaoxiao Li (University of British Columbia)
Yifan Peng (Weill Cornell Medicine)
Zongwei Zhou (Johns Hopkins University)
Yucheng Tang (Vanderbilt University)
Yuzhe Yang (MIT)
Agni Kumar (Apple)
Agni Kumar is a Research Scientist on Apple’s Health AI team. She studied at MIT, graduating with an M.Eng. in Machine Learning and B.S. degrees in Mathematics and Computer Science. Her thesis on modeling the spread of healthcare-associated infections led to joining projects at Apple with applied health focuses, specifically on understanding cognitive decline from device usage data and discerning respiratory rate from wearable microphone audio. She has published hierarchical reinforcement learning research and predictive analytics work in conferences and journals, including EMBC, PLOS Computational Biology and Telehealth and Medicine Today. She was a workshop organizer for ICML’s first-ever *Computational Approaches to Mental Health* workshop in 2021. She has also volunteered at WiML workshops and served as a reviewer for NeurIPS. For joy, Agni leads an Apple-wide global diversity network about encouraging mindfulness to find pockets of peace each day.
More from the Same Authors
-
2021 : TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation »
Jie-Neng Chen · Yongyi Lu · Qihang Yu · Xiangde Luo · Ehsan Adeli · Yan Wang · Le Lu · Alan L Yuille · Yuyin Zhou -
2021 : One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images »
Weina Jin · Xiaoxiao Li · Ghassan Hamarneh -
2021 : One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images »
Weina Jin · Xiaoxiao Li · Ghassan Hamarneh -
2023 : Panel and Closing »
Julia Schnabel · Andreas Maier · Pallavi Tiwari · Oliver Stegle · Daniel Rueckert · Ulas Bagci · Xiaoxiao Li -
2023 Poster: Change is Hard: A Closer Look at Subpopulation Shift »
Yuzhe Yang · Haoran Zhang · Dina Katabi · Marzyeh Ghassemi -
2022 Workshop: 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH) »
Ramin Zabih · S. Kevin Zhou · Weina Jin · Yuyin Zhou · Ipek Oguz · Xiaoxiao Li · Yifan Peng · Zongwei Zhou · Yucheng Tang -
2021 Workshop: Workshop on Computational Approaches to Mental Health @ ICML 2021 »
Niranjani Prasad · Caroline Weis · Shems Saleh · Rosanne Liu · Jake Vasilakes · Agni Kumar · Tianlin Zhang · Ida Momennejad · Danielle Belgrave -
2021 Workshop: Interpretable Machine Learning in Healthcare »
Yuyin Zhou · Xiaoxiao Li · Vicky Yao · Pengtao Xie · DOU QI · Nicha Dvornek · Julia Schnabel · Judy Wawira · Yifan Peng · Ronald Summers · Alan Karthikesalingam · Lei Xing · Eric Xing -
2021 : Welcoming remarks and introduction »
Yuyin Zhou -
2021 Poster: Delving into Deep Imbalanced Regression »
Yuzhe Yang · Kaiwen Zha · YINGCONG CHEN · Hao Wang · Dina Katabi -
2021 Oral: Delving into Deep Imbalanced Regression »
Yuzhe Yang · Kaiwen Zha · YINGCONG CHEN · Hao Wang · Dina Katabi -
2019 Poster: ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation »
Yuzhe Yang · GUO ZHANG · Zhi Xu · Dina Katabi -
2019 Oral: ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation »
Yuzhe Yang · GUO ZHANG · Zhi Xu · Dina Katabi