Poster
 in 
Workshop: 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH)
                        
                    
                    Prospectors: Leveraging Short Contexts to Mine Salient Objects in High-dimensional Imagery
Gautam Machiraju · Arjun Desai · James Zou · Christopher Re · Parag Mallick
Keywords: [ foundation models ] [ Interpretable ML ] [ visual grounding ] [ Explainable AI ]
High-dimensional imagery consists of high-resolution information required for end-user decision-making. Due to computational constraints, current methods for image-level classification are designed to train with image chunks or down-sampled images rather than with the full high-resolution context. While these methods achieve impressive classification performance, they often lack visual grounding and, thus, the post hoc capability to identify class-specific, salient objects under weak supervision. In this work, we (1) propose a formalized evaluation framework to assess visual grounding in high-dimensional image applications. To present a challenging benchmark, we leverage a real-world segmentation dataset for post hoc mask evaluation. We use this framework to characterize visual grounding of various baseline methods across multiple encoder classes, exploring multiple supervision regimes and architectures (e.g. ResNet, ViT). Finally, we (2) present prospector heads: a novel class of adaptation architectures designed to improve visual grounding. Prospectors leverage chunk heterogeneity to identify salient objects over long ranges and can interface with any image encoder. We find that prospectors outperform baselines by upwards of +6 balanced accuracy points and +30 precision points in a gigapixel pathology setting. Through this experimentation, we also show how prospectors can enable many classes of encoders to identify salient objects without re-training and also demonstrate their improved performance against classical explanation techniques (e.g. Attention maps).