Transitive Representation Learning Enhances Histopathology Annotation
Abstract
AI-driven disease characterization in histopathology promises to assist in clinical decision making, but its performance is limited by the scarcity of detailed annotations. In contrast, single-cell gene expression provides expressive and interpretable labels that compensate this scarcity, but assays are costly and rarely acquired in clinical workflows. To overcome this gap, we propose to bridge these data sources using a trimodal contrastive learning framework that aligns histopathology images, gene expression profiles, and natural-language descriptions. Our training data combines atlas-scale datasets of (i) spatially-resolved gene expression paired with histopathology images, and (ii) single-cell gene expression with curated annotations. Together, these data induce an alignment between image and text modalities, which we leverage for zero-shot image annotation tasks, such as the identification of immune cells. We present a sufficient condition under which this transfer can succeed and assess the performance of our approach against established baselines. We predict cell types at 15.4\% improved relative AUROC over leading pathology vision language models. Our method also exhibits significant gains across diverse prediction tasks in low-data regimes, when combining training data from all three modality pairs. Our work thus establishes transitive representation learning as an effective strategy to enhance histopathology interpretation.