ICML Poster A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features

Poster

A Cross Modal Knowledge Distillation & Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features

Ihab Bendidi · Yassir El Mesbahi · Alisandra Denton · Karush Suri · Kian Kenyon-Dean · Auguste Genovesio · Emmanuel Noutahi

West Exhibition Hall B2-B3 #W-319

[ Abstract ] [ Lay Summary ]

[ Poster] [ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Understanding cellular responses to stimuli is crucial for biological discovery and drug development. Transcriptomics provides interpretable, gene-level insights, while microscopy imaging offers rich predictive features but is harder to interpret. Weakly paired datasets, where samples share biological states, enable multimodal learning but are scarce, limiting their utility for training and multimodal inference. We propose a framework to enhance transcriptomics by distilling knowledge from microscopy images. Using weakly paired data, our method aligns and binds modalities, enriching gene expression representations with morphological information. To address data scarcity, we introduce (1) Semi-Clipped, an adaptation of CLIP for cross-modal distillation using pretrained foundation models, achieving state-of-the-art results, and (2) PEA (Perturbation Embedding Augmentation), a novel augmentation technique that enhances transcriptomics data while preserving inherent biological information. These strategies improve the predictive power and retain the interpretability of transcriptomics, enabling rich unimodal representations for complex biological tasks.

Lay Summary:

To better understand how cells respond to their environment, important for science and medicine, scientists often look at two types of data: gene activity (transcriptomics) and cell images (microscopy). Gene data is easy to interpret but may miss some details, while images show rich detail but are harder to understand. It’s rare to have both types of data for the same samples, which limits their combined use. This research introduces a new way to make gene data more informative by "borrowing" insights from images. Even when the two types of data aren't perfectly matched, the method learns to connect them. It does this in two key ways: first, by adapting a popular AI technique (like CLIP) to combine these data types and achieve top results, and second, by creating a new method to expand gene data without losing important biological meaning. The result is gene data that stays clear and understandable but also gains the rich detail typically only found in images.

Chat is not available.