Timezone: »
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less data. All codes and models will be published upon acceptance.
Author Information
Zuobai Zhang (Mila)
Zuobai Zhang (Mila)
Minghao Xu (Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal)
Minghao Xu (Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal)
Arian Jamasb (University of Cambridge)
Arian Jamasb (University of Cambridge)
Vijil Chenthamarakshan (IBM Research)
Vijil Chenthamarakshan (IBM Research)
Aurelie Lozano (IBM)
Payel Das (IBM Research AI)
Payel Das (IBM Research AI)
Jian Tang (Mila)
Jian Tang (Mila)
More from the Same Authors
-
2020 : (#95 / Sess. 2) Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures »
Arian Jamasb -
2022 : Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks »
Arian Jamasb · Ramon Viñas Torné · Eric Ma · Yuanqi Du · Charles Harris · Kexin Huang · Dominic Hall · Pietro Lió · Tom Blundell -
2022 : Evaluating Self-Supervised Learned Molecular Graphs »
Hanchen Wang · Shengchao Liu · Jean Kaddour · Qi Liu · Jian Tang · Matt Kusner · Joan Lasenby -
2022 : GAUCHE: A Library for Gaussian Processes in Chemistry »
Ryan-Rhys Griffiths · Leo Klarner · Henry Moss · Aditya Ravuri · Sang Truong · Yuanqi Du · Arian Jamasb · Julius Schwartz · Austin Tripp · Bojana Ranković · Philippe Schwaller · Gregory Kell · Anthony Bourached · Alexander Chan · Jacob Moss · Chengzhi Guo · Alpha Lee · Jian Tang -
2022 : Flaky Performances when Pre-Training on Relational Databases with a Plan for Future Characterization Efforts »
Shengchao Liu · David Vazquez · Jian Tang · Pierre-André Noël -
2022 : Evaluating Self-Supervised Learned Molecular Graphs »
Hanchen Wang · Hanchen Wang · Shengchao Liu · Shengchao Liu · Jean Kaddour · Jean Kaddour · Qi Liu · Qi Liu · Jian Tang · Jian Tang · Matt Kusner · Matt Kusner · Joan Lasenby · Joan Lasenby -
2023 : On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets »
Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel -
2023 : Unsupervised Discovery of Steerable Factors in Graphsc »
Shengchao Liu · Chengpeng Wang · Weili Nie · Hanchen Wang · Jiarui Lu · Bolei Zhou · Jian Tang -
2023 : Score-based Enhanced Sampling for Protein Molecular Dynamics »
Jiarui Lu · Bozitao Zhong · Jian Tang -
2023 : On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets »
Ching-Yun (Irene) Ko · Pin-Yu Chen · Payel Das · Yung-Sung Chuang · Luca Daniel -
2023 : Evolving Computation Graphs »
Andreea Deac · Jian Tang -
2023 Oral: ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts »
Minghao Xu · Xinyu Yuan · Santiago Miret · Jian Tang -
2023 Poster: A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining »
Shengchao Liu · weitao du · Zhiming Ma · Hongyu Guo · Jian Tang -
2023 Poster: Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction »
Minghao Guo · Veronika Thost · Samuel Song · Adithya Balachandran · Payel Das · Jie Chen · Wojciech Matusik -
2023 Poster: FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning »
Songtao Liu · Zhengkai Tu · Minkai Xu · Zuobai Zhang · Lu Lin · ZHITAO YING · Jian Tang · Peilin Zhao · Dinghao Wu -
2023 Poster: ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts »
Minghao Xu · Xinyu Yuan · Santiago Miret · Jian Tang -
2023 Poster: Reprogramming Pretrained Language Models for Antibody Sequence Infilling »
Igor Melnyk · Vijil Chenthamarakshan · Pin-Yu Chen · Payel Das · Amit Dhurandhar · Inkit Padhi · Devleena Das -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 Poster: Generative Coarse-Graining of Molecular Conformations »
Wujie Wang · Minkai Xu · Chen Cai · Benjamin Kurt Miller · Tess Smidt · Yusu Wang · Jian Tang · Rafael Gomez-Bombarelli -
2022 Poster: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Spotlight: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Spotlight: Generative Coarse-Graining of Molecular Conformations »
Wujie Wang · Minkai Xu · Chen Cai · Benjamin Kurt Miller · Tess Smidt · Yusu Wang · Jian Tang · Rafael Gomez-Bombarelli -
2022 Poster: Neural-Symbolic Models for Logical Queries on Knowledge Graphs »
Zhaocheng Zhu · Mikhail Galkin · Zuobai Zhang · Jian Tang -
2022 Spotlight: Neural-Symbolic Models for Logical Queries on Knowledge Graphs »
Zhaocheng Zhu · Mikhail Galkin · Zuobai Zhang · Jian Tang -
2021 Poster: Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design »
yue cao · Payel Das · Vijil Chenthamarakshan · Pin-Yu Chen · Igor Melnyk · Yang Shen -
2021 Spotlight: Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design »
yue cao · Payel Das · Vijil Chenthamarakshan · Pin-Yu Chen · Igor Melnyk · Yang Shen -
2019 Poster: Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning »
Jihun Yun · Peng Zheng · Eunho Yang · Aurelie Lozano · Aleksandr Aravkin -
2019 Oral: Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning »
Jihun Yun · Peng Zheng · Eunho Yang · Aurelie Lozano · Aleksandr Aravkin -
2017 Poster: Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity »
Eunho Yang · Aurelie Lozano -
2017 Talk: Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity »
Eunho Yang · Aurelie Lozano