Spotlight
in
Workshop: Accessible and Efficient Foundation Models for Biological Discovery
MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning
Kerstin Klaser · Błażej Banaszewski · Samuel Maddrell-Mander · Callum McLean · Luis Müller · Ali Parviz · Shenyang (Andy) Huang · Andrew Fitzgibbon
Keywords: [ Parameter-Efficient Foundation Model ] [ Biological Discovery ] [ Molecular Learning ]
We propose MiniMol, an open-source foundation model for molecular machine learning which outperforms the best previous foundation model on 17/22 downstream tasks from the Therapeutic Data Commons (TDC) ADMET group while having ten times fewer parameters. This efficiency is achieved through the use of a graph neural network (GNN), pre-trained on about 3,300 sparsely defined graph- and node-level tasks, using a dataset of 6 million molecules and 500 million quantum and biological labels. The model learns via multi-task, multi-label supervised training to produce embeddings that generalize well to a wide range of biological tasks, and that can be efficiently used by simple Multi-Layer Perceptron (MLP) models for the downstream task, as demonstrated by our experiments.