FLAG: Foundation model representation with Latent diffusion Alignment via Graph for spatial gene expression prediction
Abstract
Predicting spatial gene expression from routine H&E makes high-resolution molecular profiling accessible at scale, especially for large retrospective cohorts. However, current models mostly treat gene expression as a series of pointwise tasks. While effective for numerical fitting, this approach overlooks biological structures: the functional coordination between genes and their organized distribution across tissue. We reframe this task as structured distribution modeling and introduce \textbf{FLAG}, a diffusion-based framework designed to preserve these biological relationships. To capture such structures, a natural strategy would be to jointly model gene expression and their spatial interactions. However, we identify a critical \textbf{Gene Dimension Curse}: such joint modeling fails in high-dimensional gene spaces. This motivates FLAG, which conditions the generative process using a novel spatial graph encoder to ensure gene-spatial topographic coherence and a Gene Foundation Model (GFM) alignment to maintain high gene-gene structural fidelity. To rigorously assess our approach, we propose a structural evaluation metrics, including Gene Structural Correlation (\textbf{GSC}) and Spatial Structural Correlation (\textbf{SSC}). Our experiments demonstrate that FLAG is highly competitive with or superior to state-of-the-art models in traditional accuracy (PCC/MSE), while achieving significantly enhanced structural fidelity in capturing both gene-gene and gene-spatial relationships.