Workshop
Accessible and Efficient Foundation Models for Biological Discovery
Navid NaderiAlizadeh · Samuel Sledzieski · Kanchan Jha · Meghana Kshirsagar · Rohit Singh · Quincey Justman
Stolz 1
Sat 27 Jul, midnight PDT
There is a growing gap between machine learning (ML) research on biology-inspired problems and the actual broad-based use of ML in the lab or the clinic. This gap is especially pressing in the context of foundation models and other large ML models. Accessibility and efficiency concerns limit the adoption of these models by biologists and clinicians. Large ML models may require extensive GPU clusters to train, while most biological labs only have access to much more modest computational resources. The usability of these models for non-expert users is also a concern, as is the need to iteratively adapt these models based on lab discoveries. This workshop seeks to bring ML and biomedical researchers together to identify interdisciplinary approaches to design and apply large, complex ML models for biomedical discovery. We invite researchers from academia and industry to submit original papers to bridge the accessibility and efficiency gap between ML research and wet lab use. All accepted papers will be invited to present posters at the workshop, and a few will be invited to give individual spotlight presentations.
Schedule
Sat 12:00 a.m. - 12:10 a.m.
|
Opening Remarks
SlidesLive Video |
🔗 |
Sat 12:10 a.m. - 12:40 a.m.
|
Invited Talk - Burkhard Rost (Artificial Intelligence Deciphers the Code of Life Written in Proteins)
(
Invited Talk
)
>
SlidesLive Video |
Prof. Burkhard Rost 🔗 |
Sat 12:40 a.m. - 1:30 a.m.
|
Spotlight Session 1
(
Spotlight Session
)
>
SlidesLive Video |
🔗 |
Sat 1:30 a.m. - 1:40 a.m.
|
Coffee Break
|
🔗 |
Sat 1:40 a.m. - 2:10 a.m.
|
Invited Speaker - David Page (Perspectives on a Possible Foundation Model for Health)
(
Invited Talk
)
>
SlidesLive Video |
David Page 🔗 |
Sat 2:10 a.m. - 2:40 a.m.
|
Invited Speaker - Bryan Bryson (Learning the Rules of Pathogen-Derived Antigen Presentation on MHC-I and MHC-II)
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:40 a.m. - 3:30 a.m.
|
Spotlight Session 2
(
Spotlight Session
)
>
SlidesLive Video |
🔗 |
Sat 3:30 a.m. - 5:00 a.m.
|
Lunch Break
|
🔗 |
Sat 5:00 a.m. - 5:30 a.m.
|
Invited Speaker - Lenore Cowen (Learning Protein Function and Organization in Non-Model Organisms with Philharmonic)
(
Invited Talk
)
>
SlidesLive Video |
Lenore Cowen 🔗 |
Sat 5:30 a.m. - 6:30 a.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
🔗 |
Sat 6:30 a.m. - 7:00 a.m.
|
Coffee Break
|
🔗 |
Sat 7:00 a.m. - 7:50 a.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 7:50 a.m. - 8:00 a.m.
|
Closing Remarks
SlidesLive Video |
🔗 |
-
|
Fine-tuning the ESM2 protein language model to understand the functional impact of missense variants ( Poster ) > link | Ali Saadat · Jacques Fellay 🔗 |
-
|
MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning ( Spotlight ) > link | Kerstin Klaser · Błażej Banaszewski · Samuel Maddrell-Mander · Callum McLean · Luis Müller · Ali Parviz · Shenyang (Andy) Huang · Andrew Fitzgibbon 🔗 |
-
|
Graph2Token: Make LLMs Understand Molecule Graphs ( Poster ) > link | Runze Wang · Mingqi Yang · Yanming Shen 🔗 |
-
|
MSA Pairing Transfomer: protein interaction partner prediction with few-shot contrastive learning ( Poster ) > link | Alex Hawkins-Hooker · Daniel Burkhardt Cerigo · Umberto Lupo · David Jones · Brooks Paige 🔗 |
-
|
Training Compute-Optimal Protein Language Models ( Spotlight ) > link | Xingyi Cheng · Bo Chen · Pan Li · Jing Gong · Jie Tang · Le Song 🔗 |
-
|
High-Resolution In Silico Painting with Generative Models ( Poster ) > link | Trang Le 🔗 |
-
|
xMINT: A Multimodal Integration Transformer for Xenium Gene Imputation ( Poster ) > link | Xiaohui Jiang · Yuxia Xie · Jichun Xie 🔗 |
-
|
SWUS: Active Learning with Structure Weighted Uncertainty Score ( Poster ) > link | Andrea Karlova · Brooks Paige 🔗 |
-
|
ABodyBuilder3: Improved and scalable antibody structure predictions ( Poster ) > link | Henry Kenlay · Frederic Dreyer · Daniel Cutting · Daniel Nissley · Charlotte Deane 🔗 |
-
|
BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning ( Poster ) > link | Varuni Sarwal · Seungmo Lee · Rosemary He · Aingela Kattapuram · Mandy Wang · Eleazar Eskin · Wei Wang · Serghei Mangul 🔗 |
-
|
RFamLlama: an efficient conditional language model for RNA sequence generation across diverse structural families ( Poster ) > link | Jinyuan Sun · Han Li · Yifan Deng 🔗 |
-
|
Compressing the Latent Space of Single-Sequence Protein Predictors for Multimodal Generation ( Poster ) > link | Amy X. Lu · Wilson Yan · Vladimir Gligorijevic · Pieter Abbeel · Kevin Yang · Nathan Frey 🔗 |
-
|
2Bits of Protein: Efficient Protein Language Models at the Scale of 2-bits ( Poster ) > link | Ollie Turnbull · Mohamed Baioumy · Charlotte Deane 🔗 |
-
|
Rethinking Molecular Design: Integrating Latent Variable and Auto-Regressive Models for Enhanced Goal Directed Generation ( Poster ) > link | Arthur-Louis Heath · maolaaisha aminanmu · Michael Krauthammer 🔗 |
-
|
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling ( Poster ) > link | Yair Schiff · Chia Hsiang Kao · Aaron Gokaslan · Tri Dao · Albert Gu · Volodymyr Kuleshov 🔗 |
-
|
Are Protein Language Models Compute Optimal? ( Poster ) > link | Yaiza Serrano · Alvaro Ciudad Serrano · Alexis Molina 🔗 |
-
|
Identifying Biological Priors and Structure in Single-Cell Foundation Models ( Poster ) > link | Flavia Pedrocchi · Stefan Stark · Gunnar Ratsch · Amir Joudaki 🔗 |
-
|
Cramming Protein Language Model Training in 24 GPU Hours ( Spotlight ) > link | Nathan Frey · Taylor Joren · Aya Ismail · Allen Goodman · Richard Bonneau · Kyunghyun Cho · Vladimir Gligorijevic 🔗 |
-
|
Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming ( Poster ) > link | João Loula · Katie Collins · Ulrich Schaechtle · Josh Tenenbaum · Adrian Weller · Feras Saad · Timothy O'Donnell · Vikash Mansinghka 🔗 |
-
|
Pre-training of Single-cell Language Models through Genetic Pathway Learning ( Poster ) > link | Xuxi Chen · Zhangyang “Atlas” Wang · Marinka Zitnik · Manolis Kellis · Tianlong Chen 🔗 |
-
|
Interactome-scale comparison of co-immunoprecipitation and yeast two-hybrid assays for protein interaction prediction ( Poster ) > link | Kapil Devkota · Lenore Cowen · Rohit Singh 🔗 |
-
|
Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets ( Poster ) > link | Ulrich Armel Mbou Sob · Qiulin Li · Miguel Arbesú · Oliver Bent · Andries Smit · Arnu Pretorius 🔗 |
-
|
Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design ( Spotlight ) > link | Alex Hawkins-Hooker · Jakub Kmec · Oliver Bent · Paul Duckworth 🔗 |
-
|
One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data ( Spotlight ) > link | Michal Golovanevsky · Eva Schiller · Akira Nair · Ritambhara Singh · Carsten Eickhoff 🔗 |
-
|
Geometric Algebra based encoding for graph prompting ( Poster ) > link | Sotirios Panagiotis Chytas · Rudrasis Chakraborty · Vikas Singh 🔗 |
-
|
FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking ( Poster ) > link | Sophia Vincoff · Shrey Goel · Kseniia Kholina · Pranam Chatterjee 🔗 |
-
|
Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction ( Poster ) > link | Fatemeh Nassajian Mojarrad · Lorenzo Bini · Thomas Matthes · Stephane Marchand-Maillet 🔗 |
-
|
ProtMamba: a homology-aware but alignment-free protein state space model ( Spotlight ) > link | Damiano Sgarbossa · Cyril Malbranke · Anne-Florence Bitbol 🔗 |
-
|
Simple and Effective Masked Diffusion Language Models ( Spotlight ) > link | Subham Sekhar Sahoo · Marianne Arriola · Aaron Gokaslan · Edgar Marroquin · Alexander Rush · Yair Schiff · Justin Chiu · Volodymyr Kuleshov 🔗 |
-
|
MolEval: An Evaluation Toolkit for Molecular Embeddings via LLMs ( Poster ) > link | Shaghayegh Sadeghi · Ali Forooghi · Jianguo Lu · Alioune Ngom 🔗 |
-
|
scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data ( Spotlight ) > link | Moritz Vandenhirtz · Florian Barkmann · Laura Manduchi · Valentina Boeva · Julia Vogt 🔗 |
-
|
Prot2Token: A multi-task framework for protein language processing using autoregressive language modeling ( Poster ) > link | Pourmirzaei · Farzaneh Esmaili · Mohammadreza Pourmirzaeioliaei · Duolin Wang · Dong Xu 🔗 |
-
|
Multi-Task Training Increases Native Sequence Recovery of Antigen-Specific T-cell Receptor Sequences ( Poster ) > link | Dhuvarakesh Karthikeyan · Alex Rubinsteyn 🔗 |
-
|
A generative foundation model for antibody sequence understanding ( Poster ) > link |
13 presentersJustin Barton · Aretas Gaspariunas · David Yadin · Jorge Dias · Francesca Nice · Danielle Minns · Olivia Snudden · Chelsea Povall · Sara Tomas · Harry Dobson · James Farmery · Jinwoo Leem · Jacob Galson |
-
|
PLUTO: Pathology-Universal Transformer ( Poster ) > link |
33 presentersDinkar Juyal · Harshith Padigela · Chintan Shah · Daniel Shenker · Natalia Harguindeguy · Yi Liu · Blake Martin · Yibo Zhang · Michael Nercessian · Miles Markey · Isaac Finberg · Kelsey Luu · Daniel Borders · Syed Ashar Javed · Emma Krause · Raymond Biju · Aashish Sood · Allen Ma · Jackson Nyman · John Shamshoian · Guillaume Chhor · Darpan Sanghavi · Marc Thibault · Limin Yu · Fedaa Najdawi · Jennifer Hipp · Darren Fahy · Benjamin Glass · Eric Walk · John Abel · Harsha pokkalla · Andrew Beck · Sean Grullon |
-
|
Enhancing Single-Cell VAE Latent Space via Semi-Supervision ( Poster ) > link | Meichen Gong · Konstantin Ivanov · Merja Heinäniemi · Ville Hautamäki 🔗 |
-
|
Towards generalizable particle picking in Cryo-EM images by leveraging Masked AutoEncoders ( Poster ) > link | Andreas Zamanos · Panagiotis Koromilas · Giorgos Bouritsas · Panagiotis Kastritis · Yannis Panagakis 🔗 |