Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery

Pre-training of Single-cell Language Models through Genetic Pathway Learning

Xuxi Chen · Zhangyang “Atlas” Wang · Marinka Zitnik · Manolis Kellis · Tianlong Chen

Keywords: [ scRNA-seq; foundation model; ]


Abstract: The utilization of state-of-the-art single-cell RNA sequencing (scRNA-seq) techniques has significantly enhanced the depth and richness of scRNA-seq datasets, contributing to a more comprehensive comprehension of cellular biology and facilitating advancements across a spectrum of research domains. In this work, we propose a novel $\textbf{S}$ingle-$\textbf{c}$ell Pre-trained $\textbf{L}$anguage $\textbf{M}$odel via Genetic $\textbf{Pa}$thway Learning, named scPaLM, that effectively harnesses scRNA-seq data and enables various downstream applications. scPaLM integrates several innovative designs: ($1$) an embedding process that adeptly represents gene information with a reduced token count, enhancing computational efficiency; ($2$) a genetic pathway learning module that is designed to learn discrete representations, enabling the modeling of collective gene behaviors in a data-driven way; ($3$) an innovative training methodology that progressively aggregates cell representations into a designated token during the training phase, with a tailored masking strategy and a token-level contrastive regularizer. scPaLM demonstrates superior performance on various downstream tasks, including cell type annotations, imputation, and cancer drug response prediction, by clear margins compared to baselines. Codes will be made public.

Chat is not available.