Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

Evaluating Self-Supervised Learned Molecular Graphs

Hanchen Wang · Hanchen Wang · Shengchao Liu · Shengchao Liu · Jean Kaddour · Jean Kaddour · Qi Liu · Qi Liu · Jian Tang · Jian Tang · Matt Kusner · Matt Kusner · Joan Lasenby · Joan Lasenby


Because of data scarcity in real-world scenarios, obtaining pre-trained representations via self-supervised learning (SSL) has attracted increasing interest. Although various methods have been proposed, it is still under-explored what knowledge the networks learn from the pre-training tasks and how it relates to downstream properties. In this work, with an emphasis on chemical molecular graphs, we fill in this gap by devising a range of node-level, pair-level, and graph-level probe tasks to analyse the representations from pre-trained graph neural networks (GNNs). We empirically show that: 1. Pre-trained models have better downstream performance compared to randomly-initialised models due to their improved the capability of capturing global topology and recognising substructures. 2. However, randomly initialised models outperform pre-trained models in terms of retaining local topology. Such information gradually disappears from the early layers to the last layers for pre-trained models.

Chat is not available.