Skip to yearly menu bar Skip to main content


Poster

Cell2Sentence: Teaching Large Language Models the Language of Biology

Daniel Levine · Syed Rizvi · Sacha Lévy · Nazreen Pallikkavaliyaveetil MohammedSheriff · David Zhang · Xingyu Chen · SINA GHADERMARZI · Ruiming Wu · Zihe Zheng · Ivan Vrkic · Anna Zhong · Daphne Raskin · Insu Han · Antonio Henrique de Oliveira Fonseca · Josue Ortega Caro · Amin Karbasi · Rahul Dhodapkar · David van Dijk

Hall C 4-9 #315
[ ]
Wed 24 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models to a biological context, specifically single-cell transcriptomics. By transforming gene expression data into "cell sentences," C2S bridges the gap between natural language processing and biology. We demonstrate cell sentences enable the fine-tuning of language models for diverse tasks in biology, including cell generation, complex cell-type annotation, and direct data-driven text generation. Our experiments reveal that GPT-2, when fine-tuned with C2S, can generate biologically valid cells based on cell type inputs, and accurately predict cell types from cell sentences. This illustrates that language models, through C2S fine-tuning, can acquire a significant understanding of single-cell biology while maintaining robust text generation capabilities. C2S offers a flexible, accessible framework to integrate natural language processing with transcriptomics, utilizing existing models and libraries for a wide range of biological applications.

Live content is unavailable. Log in and register to view live content