ConEx: Human-Interpretable Saliency Maps via Concept-Aware Attribution
Abstract
Many visual explanation methods in computer vision highlight pixel importance but struggle to link these low-level cues to semantically meaningful concepts, limiting their interpretability and trustworthiness. We introduce Concept-based Explanations (ConEx), a novel framework that bridges saliency visualization with concept-based reasoning to provide both localized and global interpretability. ConEx automatically discovers class-specific concepts and represents them through concept activation vectors (CAVs), learned without manual supervision using an architecture-specific masking mechanism that reduces noise introduced by the segmentation masks to enhance concept purity. Locally, ConEx generates saliency maps that reveal where each concept appears in the image and how it contributes to the prediction; globally, it identifies the most influential concepts for each class. To evaluate the reliability of these learned concepts, we propose two complementary metrics, Vector-Concept Match (VCM) and Concept-Class Match (CCM), that quantify concept alignment and enable direct comparison with existing methods. Extensive experiments across diverse datasets and architectures demonstrate that ConEx achieves state-of-the-art performance on faithfulness, segmentation, and concept-quality benchmarks. Human studies further confirm that the discovered concepts are interpretable, distinctive, and aligned with human understanding. Overall, ConEx advances the field toward truly interpretable and concept-grounded explanations in vision models.