Neuro-Fuzzy Concept Learning for Interpretable Large Multimodal Models
Abstract
Large Multimodal Models (LMMs) integrate unimodal encoders with Large Language Models (LLMs) to execute complex multimodal tasks. Despite progress in the field, understanding the internal representations of these models through interpretable logic remains an open problem. To address this, we present a framework utilizing a Human-Inspired (Neuro-fuzzy) approach for learning token representations. In this method, we leverage fuzzy rules to compute activation firing strengths, which are subsequently defuzzified to extract distinct concepts. This mechanism allows for the interpretation of learned representations directly through explicit logic. Consequently, we derive "multimodal concepts" that are both semantically coherent and interpretable. We validate our approach through rigorous qualitative and quantitative experiments, demonstrating the utility of these concepts in interpreting test samples. Additionally, we evaluate the disentanglement of the learned concepts and the efficacy of their grounding in both visual and textual domains.