Lingshu-Cell: A Generative Cellular World Model for Transcriptome Modeling toward Virtual Cells
Abstract
Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. We introduce Lingshu-Cell, a generative cellular world model built on masked discrete diffusion that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Within a unified model, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns, and cell-subtype proportions across diverse human tissues, and predicts whole-transcriptome responses to perturbations, including generalization to perturbations not seen during training. Beyond conditional generation, Lingshu-Cell also provides informative cell representations for downstream analysis. Together, these results establish Lingshu-Cell as a flexible and unified foundation model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.
Speaker