Timezone: »
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less data. All codes and models will be published upon acceptance.
Author Information
Zuobai Zhang (Mila)
Zuobai Zhang (Mila)
Minghao Xu (Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal)
Minghao Xu (Montreal Institute for Learning Algorithms, University of Montreal, University of Montreal)
Arian Jamasb (University of Cambridge)
Arian Jamasb (University of Cambridge)
Vijil Chenthamarakshan (IBM Research)
Vijil Chenthamarakshan (IBM Research)
Aurelie Lozano (IBM)
Payel Das (IBM Research AI)
Payel Das (IBM Research AI)
Jian Tang (Mila)
Jian Tang (Mila)
More from the Same Authors
-
2020 : (#95 / Sess. 2) Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures »
Arian Jamasb -
2022 : Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks »
Arian Jamasb · Ramon Viñas Torné · Eric Ma · Yuanqi Du · Charles Harris · Kexin Huang · Dominic Hall · Pietro Lió · Tom Blundell -
2022 : Evaluating Self-Supervised Learned Molecular Graphs »
Hanchen Wang · Shengchao Liu · Jean Kaddour · Qi Liu · Jian Tang · Matt Kusner · Joan Lasenby -
2022 : GAUCHE: A Library for Gaussian Processes in Chemistry »
Ryan-Rhys Griffiths · Leo Klarner · Henry Moss · Aditya Ravuri · Sang Truong · Yuanqi Du · Arian Jamasb · Julius Schwartz · Austin Tripp · Bojana Ranković · Philippe Schwaller · Gregory Kell · Anthony Bourached · Alexander Chan · Jacob Moss · Chengzhi Guo · Alpha Lee · Jian Tang -
2022 : Flaky Performances when Pre-Training on Relational Databases with a Plan for Future Characterization Efforts »
Shengchao Liu · David Vazquez · Jian Tang · Pierre-André Noël -
2022 : Evaluating Self-Supervised Learned Molecular Graphs »
Hanchen Wang · Hanchen Wang · Shengchao Liu · Shengchao Liu · Jean Kaddour · Jean Kaddour · Qi Liu · Qi Liu · Jian Tang · Jian Tang · Matt Kusner · Matt Kusner · Joan Lasenby · Joan Lasenby -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 Poster: Generative Coarse-Graining of Molecular Conformations »
Wujie Wang · Minkai Xu · Chen Cai · Benjamin Kurt Miller · Tess Smidt · Yusu Wang · Jian Tang · Rafael Gomez-Bombarelli -
2022 Poster: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Spotlight: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Spotlight: Generative Coarse-Graining of Molecular Conformations »
Wujie Wang · Minkai Xu · Chen Cai · Benjamin Kurt Miller · Tess Smidt · Yusu Wang · Jian Tang · Rafael Gomez-Bombarelli -
2022 Poster: Neural-Symbolic Models for Logical Queries on Knowledge Graphs »
Zhaocheng Zhu · Mikhail Galkin · Zuobai Zhang · Jian Tang -
2022 Spotlight: Neural-Symbolic Models for Logical Queries on Knowledge Graphs »
Zhaocheng Zhu · Mikhail Galkin · Zuobai Zhang · Jian Tang -
2021 Poster: Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design »
yue cao · Payel Das · Vijil Chenthamarakshan · Pin-Yu Chen · Igor Melnyk · Yang Shen -
2021 Spotlight: Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design »
yue cao · Payel Das · Vijil Chenthamarakshan · Pin-Yu Chen · Igor Melnyk · Yang Shen -
2019 Poster: Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning »
Jihun Yun · Peng Zheng · Eunho Yang · Aurelie Lozano · Aleksandr Aravkin -
2019 Oral: Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning »
Jihun Yun · Peng Zheng · Eunho Yang · Aurelie Lozano · Aleksandr Aravkin -
2017 Poster: Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity »
Eunho Yang · Aurelie Lozano -
2017 Talk: Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity »
Eunho Yang · Aurelie Lozano