scDEBART: Predicting in silico Single-Cell Perturbation Responses via Large-Scale Differential Expression Learning
Jieun Sung ⋅ Wankyu Kim
Abstract
Single-cell foundation models trained on millions of cells can learn gene expression patterns across diverse contexts. However, for predicting genetic perturbation effects they often underperform simple regression models. We hypothesize two potential limitations: targets defined on dropout-prone absolute expression, and pretraining focused on reconstructing absolute expression within cells, which captures static co-expression patterns but may not encode how genes co-regulate in response to expression changes. We introduce $\textbf{scDEBART}$, a foundation model pretrained to predict log fold-changes (logFC) conditioned on basal expression, thereby learning how gene sets co-vary across basal states at scale. To obtain reliable estimates of expression change under technical sparsity, we compute logFC from scVI-denoised expression and restrict pretraining to genes with robust detection. Pretrained on 6.28 million expression-change profiles from 66.6 million human cells and fine-tuned on five Perturb-seq datasets, scDEBART achieves mean enrichment factor (EF) of 11.96, 4--7$\times$ higher than scGPT and GEARS (mean EF 1.74--2.99), and 42.8\% top-1 accuracy for reverse perturbation identification compared to near-zero accuracy for prior models. In cross-modal transfer to drug perturbations (SCIPLEX), the model shows dose-dependent enrichment (EF 2.03--4.31), suggesting partial transfer of learned regulatory patterns across modalities. Overall, these results indicate that large-scale pretraining on scVI-stabilized expression-change profiles provides a useful inductive bias for perturbation prediction.
Successful Page Load