Workshop
Graph Representation Learning and Beyond (GRL+)
Petar Veličković · Michael M. Bronstein · Andreea Deac · Will Hamilton · Jessica Hamrick · Milad Hashemi · Stefanie Jegelka · Jure Leskovec · Renjie Liao · Federico Monti · Yizhou Sun · Kevin Swersky · Rex (Zhitao) Ying · Marinka Zitnik
Thu 16 Jul, 11:40 p.m. PDT
Keywords: Graph Representation Learning Graph Neural Networks Geometric Deep Learning
Recent years have seen a surge in research on graph representation learning, including techniques for deep graph embeddings, generalizations of CNNs to graphstructured data, and neural messagepassing approaches. These advances in graph neural networks and related techniques have led to new stateoftheart results in numerous domains: chemical synthesis, 3Dvision, recommender systems, question answering, continuous control, selfdriving and social network analysis. Building on the successes of three related workshops from last year (at ICML, ICLR and NeurIPS), the primary goal for this workshop is to facilitate community building, and support expansion of graph representation learning into more interdisciplinary projects with the natural and social sciences. With hundreds of new researchers beginning projects in this area, we hope to bring them together to consolidate this fastgrowing area into a healthy and vibrant subfield. Especially, we aim to strongly promote novel and exciting applications of graph representation learning across the sciences, reflected in our choices of invited speakers.
Schedule
Thu 11:40 p.m.  12:00 a.m.

Opening Remarks
(
Introduction
)
>

Petar Veličković · AndreeaIoana Deac 🔗 
Fri 12:00 a.m.  12:30 a.m.

Invited Talk: Xavier Bresson
(
Talk
)
>
SlidesLive Video 
Xavier Bresson 🔗 
Fri 12:30 a.m.  1:00 a.m.

Invited Talk: Thomas Kipf
(
Talk
)
>
SlidesLive Video 
Thomas Kipf 🔗 
Fri 1:00 a.m.  1:30 a.m.

Q&A / Discussions / Coffee 1
(
Discussion
)
>

🔗 
Fri 1:30 a.m.  2:30 a.m.

Virtual Poster Session #1
(
Poster Session
)
>

🔗 
Fri 2:30 a.m.  2:40 a.m.

Novel Applications: WikiCS: A WikipediaBased Benchmark for Graph Neural Networks
(
Spotlight
)
>
SlidesLive Video We present WikiCS, a novel dataset derived from Wikipedia for benchmarking Graph Neural Networks. The dataset consists of nodes corresponding to Computer Science articles, with edges based on hyperlinks and 10 classes representing different branches of the field. We use the dataset to evaluate semisupervised node classification and singlerelation link prediction models. Our experiments show that these methods perform well on a new domain, with structural properties different from earlier benchmarks. We will make the dataset publicly available for future work. 

Fri 2:40 a.m.  2:50 a.m.

Novel Applications: Graph Neural Networks for Massive MIMO Detection
(
Spotlight
)
>
SlidesLive Video In this paper, we innovately use graph neural networks (GNNs) to learn a messagepassing solution for the inference task of massive multiple multipleinput multipleoutput (MIMO) detection in wireless communication. We adopt a graphical model based on the Markov random field (MRF) where belief propagation (BP) yields poor results when it assumes a uniform prior over the transmitted symbols. Numerical simulations show that, under the uniform prior assumption, our GNNbased MIMO detection solution outperforms the minimum meansquared error (MMSE) baseline detector, in contrast to BP. Furthermore, experiments demonstrate that the performance of the algorithm slightly improves by incorporating MMSE information into the prior. 

Fri 2:50 a.m.  3:00 a.m.

Novel Applications: Embedding a random graph via GNN: Extended meanfield inference theory and RL applications to NPHard multirobot/machine scheduling
(
Spotlight
)
>
SlidesLive Video We illustrate that developing a theory of ‘how to embed a random graph using GNN’ is the key to achieving the first nearoptimal learningbased scheduling algorithm for an NPhard multirobot scheduling problem for tasks with timevarying rewards. We focus on a problem referred to as a MultiRobot Reward Collection (MRRC) problem, of which immediate applications are ridesharing and pickupanddelivery problems. We 1) observe that states in our robot scheduling problems can be represented as an extension of probabilistic graphical models (PGMs), which we refer to as random PGMs, and 2) develop a meanfield inference method for random PGMs. We then prove that a simple heuristic for applying deep graph encoder for random graph embedding is theoretically justified. We illustrate how a twostep hierarchical inference induces precise Qfunction estimation. We empirically demonstrate that our method achieves nearoptimality (plus transferability and scalability, machine scheduling (IPMS) applications in the appendix section). Arxiv preprint: https://arxiv.org/abs/1905.12204. 

Fri 3:00 a.m.  3:10 a.m.

Update: PyTorch Geometric
(
Demonstration
)
>
SlidesLive Video 
Matthias Fey 🔗 
Fri 3:10 a.m.  3:20 a.m.

Update: Deep Graph Library
(
Demonstration
)
>
SlidesLive Video 
George Karypis 🔗 
Fri 3:20 a.m.  3:30 a.m.

Update: Open Graph Benchmark
(
Demonstration
)
>
SlidesLive Video 
Jure Leskovec 🔗 
Fri 3:30 a.m.  4:30 a.m.

Lunch Break
(
Break
)
>

🔗 
Fri 4:30 a.m.  4:45 a.m.

Original Research: Get Rid of Suspended Animation: Deep Diffusive Neural Network for Graph Representation Learning
(
Spotlight
)
>
SlidesLive Video Existing graph neural networks may suffer from the “suspended animation problem” when the model architecture goes deep. Meanwhile, for some graph learning scenarios, e.g., nodes with text/image attributes or graphs with long distance node correlations, deep graph neural networks will be necessary for effective graph representation learning. In this paper, we propose a new graph neural network, namely DIFNET (Graph Diffusive Neural Network), for deep graph representation learning and node classification. DIFNET utilizes both neural gates and graph residual learning for node hidden state modeling, and includes an attention mechanism for node neighborhood information diffusion. Extensive experimental results can illustrate both the learning performance advantages of DIFNET compared with existing methods, especially in addressing the “suspended animation problem”. 

Fri 4:45 a.m.  5:00 a.m.

Original Research: Learning Graph Models for TemplateFree Retrosynthesis
(
Spotlight
)
>
SlidesLive Video Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to identify precursor molecules that can be used to synthesize a target molecule. Despite advancements in neural retrosynthesis algorithms, they are unable to fully recapitulate the strategies employed by chemists and do not generalize well to infrequent reactions. In this paper, we propose a graphbased approach that capitalizes on the idea that graph topology of precursor molecules is largely unaltered during the reaction. The model first predicts the set of graph edits transforming the target into incomplete molecules called synthons. Next, the model learns to expand synthons into complete molecules by attaching relevant leaving groups. Since the model operates at the level of molecular fragments, it avoids full generation, greatly simplifying the underlying architecture and improving its ability to generalize. The model yields 11.7% absolute improvement over stateoftheart approaches on the USPTO50k dataset. 

Fri 5:00 a.m.  5:15 a.m.

Original Research: Frequent Subgraph Mining by Walking in Order Embedding Space
(
Spotlight
)
>
SlidesLive Video Identifying frequent subgraphs or network motifs has been crucial in analyzing and predicting properties of realworld networks. However, finding large commonlyoccurring motifs remains an open problem due to its NPhard subroutine of subgraph counting, and the combinatorial growth of the number of possible subgraphs with their size. Here we present Subgraph Pattern Miner (SPMiner), a novel neural approach to finding frequent subgraphs in a large target graph. SPMiner integrates graph neural networks, order embedding space, and an efficient search strategy to identify network subgraph patterns that appear most frequently in a target graph dataset. SPMiner first decomposes the target graph into many overlapping subgraphs and then encodes the subgraphs into order embeddings. SPMiner then uses a monotonic walk in the order embedding space to identify frequent motifs. Compared to existing approaches and possible neural alternatives, SPMiner is more accurate, faster, and more scalable. For 5 and 6node motifs, we show that SPMiner can identify almost all of the most frequent motifs while being 100x faster than exact enumeration methods. In addition, SPMiner can also reliably identify frequent 10node motifs, which is well beyond the size limit of exact enumeration approaches. And last, We show that SPMiner can find large 10+ node motifs that appear 10100x more frequently than those found by current widelyused approximate methods. 

Fri 5:15 a.m.  5:45 a.m.

Invited Talk: Kyle Cranmer
(
Talk
)
>
SlidesLive Video 
Kyle Cranmer 🔗 
Fri 5:45 a.m.  6:15 a.m.

Invited Talk: Danai Koutra
(
Talk
)
>
SlidesLive Video 
Danai Koutra 🔗 
Fri 6:15 a.m.  6:45 a.m.

Q&A / Discussions / Coffee 2
(
Discussion
)
>

🔗 
Fri 6:45 a.m.  6:55 a.m.

COVID19 Applications: Navigating the Dynamics of Financial Embeddings over Time
(
Spotlight
)
>
SlidesLive Video Financial transactions constitute connections between entities and through these connections a large scale heterogeneous weighted graph is formulated. In this labyrinth of interactions that are continuously updated, there exists a variety of similaritybased patterns that can provide insights into the dynamics of the financial system. With the current work, we propose the application of Graph Representation Learning in a scalable dynamic setting as a means of capturing these patterns in a meaningful and robust way. We proceed to perform a rigorous qualitative analysis of the latent trajectories to extract real world insights from the proposed representations and their evolution over time that is to our knowledge the first of its kind in the financial sector. Shifts in the latent space are associated with known economic events and in particular the impact of the recent Covid19 pandemic to consumer patterns. Capturing such patterns indicates the value added to financial modeling through the incorporation of latent graph representations. 

Fri 6:55 a.m.  7:05 a.m.

COVID19 Applications: Integrating Logical Rules Into Neural MultiHop Reasoning for Drug Repurposing
(
Spotlight
)
>
SlidesLive Video The graph structure of biomedical data differs from those in typical knowledge graph benchmarking tasks. A particular property of biomedical data is the presence of longrange dependencies, which can be captured by patterns described as logical rules. We propose a novel method that combines these rules with a neural multihop reasoning approach that uses reinforcement learning. We conduct an empirical study based on the realworld task of drug repurposing by formulating this task as a link prediction problem. We apply our method to the biomedical knowledge graph Hetionet and show that our approach outperforms several baseline methods. 

Fri 7:05 a.m.  7:15 a.m.

COVID19 Applications: Gaining insight into SARSCoV2 infection and COVID19 severity using selfsupervised edge features and Graph Neural Networks
(
Spotlight
)
>
SlidesLive Video Graph Neural Networks (GNN) have been extensively used to extract meaningful representations from graph structured data and to perform predictive tasks such as node classification and link prediction. In recent years, there has been a lot of work incorporating edge features along with node features for prediction tasks. In this work, we present a framework for creating new edge features, via a combination of selfsupervised and unsupervised learning which we then use along with node features for node classification tasks. We validate our work on two biological datasets comprising of singlecell RNA sequencing data of in vitro SARSCoV2 infection and human COVID19 patients. We demonstrate that our method achieves better performance over baseline Graph Attention Network (GAT) and Graph Convolutional Network (GCN) models. Furthermore, given the attention mechanism on edge and node features, we are able to interpret the cell types and genes that determine the course and severity of COVID19, contributing to a growing list of potential disease biomarkers and therapeutic targets. 

Fri 7:15 a.m.  7:45 a.m.

Invited Talk: Tina EliassiRad
(
Talk
)
>
SlidesLive Video 
Tina EliassiRad 🔗 
Fri 7:45 a.m.  8:15 a.m.

Invited Talk: Raquel Urtasun
(
Talk
)
>
SlidesLive Video 
Raquel Urtasun 🔗 
Fri 8:15 a.m.  8:45 a.m.

Q&A / Discussions / Coffee 3
(
Discussion
)
>

🔗 
Fri 8:45 a.m.  9:45 a.m.

Virtual Poster Session #2
(
Poster Session
)
>

🔗 
Fri 9:45 a.m.  10:00 a.m.

Closing Remarks
(
Conclusions
)
>

Petar Veličković 🔗 


(#2 / Sess. 1) When Spectral Domain Meets Spatial Domain in Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video Convolutional Graph Neural Networks (ConvGNNs) are designed either in the spectral domain or in the spatial domain. In this paper, we provide a theoretical framework to analyze these neural networks, by deriving some equivalence of the graph convolution processes, regardless if they are designed in the spatial or the spectral domain. We demonstrate the relevance of the proposed framework by providing a spectral analysis of the most popular ConvGNNs (ChebNet, CayleyNet, GCN and Graph Attention Networks), which allows to explain their performance and shows their limits. 



(#3 / Sess. 2) Spectraldesigned Depthwise Separable Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video This paper aims at revisiting Convolutional Graph Neural Networks (ConvGNNs) by designing new graph convolutions in spectral domain with a custom frequency profile while applying them in the spatial domain. Within the proposed framework, we propose two ConvGNNs methods: one using a simple singleconvolution kernel that operates as a lowpass filter, and one operating multiple convolution kernels called Depthwise Separable Graph Convolution Network (DSGCN). The latter is a generalization of the depthwise separable convolution framework for graph convolutional networks, which allows to decrease the total number of trainable parameters while keeping the capacity of the model unchanged. Our proposals are evaluated on both transductive and inductive graph learning problems, demonstrating that DSGCN outperforms the stateoftheart methods. 



(#8 / Sess. 2) Practical Adversarial Attacks on Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video We study the blackbox attacks on graph neural networks (GNNs) under a novel and realistic constraint: attackers have access to only a subset of nodes in the network, and they can only attack a small number of them. A node selection step is essential under this setup. We demonstrate that the structural inductive biases of GNN models can be an effective source for this type of attacks. Specifically, by exploiting the connection between the backward propagation of GNNs and random walks, we show that the common gradientbased whitebox attacks can be generalized to the blackbox setting via the connection between the gradient and an importance score similar to PageRank. In practice, we find attacks based on this importance score indeed increase the classification loss by a large margin, but they fail to significantly increase the misclassification rate. Our further analyses suggest that there is a discrepancy between the loss and misclassification rate, as the latter presents a diminishingreturn pattern when the number of attacked nodes increases. Therefore, we propose a greedy procedure to correct the importance score that takes into account of the diminishingreturn pattern. Experimental results show that the proposed procedure can significantly increase the misclassification rate of common GNNs on realworld data without access to model parameters nor predictions. 



(#9 / Sess. 1) Graph Neural Networks in TensorFlow and Keras with Spektral
(
Poster Teaser
)
>
SlidesLive Video In this paper we present Spektral, an opensource Python library for building graph neural networks with TensorFlow and the Keras application programming interface. Spektral implements a large set of methods for deep learning on graphs, including messagepassing and pooling operators, as well as utilities for processing graphs and loading popular benchmark datasets. The purpose of this library is to provide the essential building blocks for creating graph neural networks, focusing on the guiding principles of userfriendliness and quick prototyping on which Keras is based. Spektral is, therefore, suitable for absolute beginners and expert deep learning practitioners alike. In this work, we present an overview of Spektral's features and report the performance of the methods implemented by the library in scenarios of node classification, graph classification, and graph regression. 



(#10 / Sess. 2) MultiGraph Neural Operator for Parametric Partial Differential Equations
(
Poster Teaser
)
>
SlidesLive Video Graph neural networks (GNNs) have gained popularity in simulating physical systems and solving partial differential equations (PDEs) since graphs offer a natural way of modeling particle interactions and discretizing the continuum models. However, the graphs constructed for approximating such tasks usually ignore longrange interactions due to unfavorable scaling of the computational complexity with respect to the number of nodes. The errors due to these approximations scale with the discretization of the system, thereby not allowing for generalization under meshrefinement. Inspired by the classical multipole methods, we propose a novel multilevel graph neural network framework that captures interaction at all ranges with only linear complexity. Our multilevel formulation is equivalent to recursively adding inducing points to the kernel matrix, unifying GNNs with multiresolution matrix factorization of the kernel. Experiments confirm our multigraph network learns discretizationinvariant solution operators to PDEs and can be evaluated in linear time. 



(#12 / Sess. 1) Deep Graph Contrastive Representation Learning
(
Poster Teaser
)
>
SlidesLive Video Graph representation learning nowadays becomes fundamental in analyzing graphstructured data. Inspired by recent success of contrastive methods, in this paper, we propose a novel framework for unsupervised graph representation learning by leveraging a contrastive objective at the node level. Specifically, we generate two graph views by corruption and learn node representations by maximizing the agreement of node representations in these two views. To provide diverse node contexts for the contrastive objective, we propose a hybrid scheme for generating graph views on both structure and attribute levels. We perform empirical experiments on both transductive and inductive learning tasks using a variety of realworld datasets. Experimental experiments demonstrate that despite its simplicity, our proposed method consistently outperforms existing stateoftheart methods by large margins. Notably, our method gains about 10% absolute improvements on protein function prediction. Our unsupervised method even surpasses its supervised counterparts on transductive tasks. 



(#15 / Sess. 2) Learning Distributed Representations of Graphs with Geo2DR
(
Poster Teaser
)
>
SlidesLive Video We present Geo2DR (Geometric to Distributed Representations), a Python library for unsupervised learning on graphstructured data using discrete substructure patterns and neural language models. It contains efficient implementations of popular graph decomposition algorithms and neural language models in PyTorch which are combined to learn representations using the distributive hypothesis. Furthermore, Geo2DR comes with general data processing and loading methods which can bring substantial speedup in the training of the neural language models. Through this we provide a unified set of tools and methods to quickly construct systems capable of learning distributed representations of graphs. This is useful for replication of existing methods, modification, or development of novel systems. This paper serves to present the Geo2DR library and perform a comprehensive comparative analysis of existing methods reimplemented using Geo2DR across several widely used graph classification benchmarks. Geo2DR displays a high reproducibility of results in published methods and interoperability with other libraries useful for distributive language modelling. 



(#18 / Sess. 1) Hierarchical Protein Function Prediction with TailGNNs
(
Poster Teaser
)
>
SlidesLive Video Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their builtin inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to the label space. Accordingly, we propose TailGNNs, neural networks which naturally compose with the output space of any neural network for multitask prediction, to provide relationallyreinforced labels. For protein function prediction, we combine a TailGNN with a dilated convolutional network which learns representations of the protein sequence, making significant improvement in F_1 score and demonstrating the ability of TailGNNs to learn useful representations of labels and exploit them in realworld problem solving. 



(#19 / Sess. 1) Neural Bipartite Matching
(
Poster Teaser
)
>
SlidesLive Video Graph neural networks (GNNs) have found application for learning in the space of algorithms. However, the algorithms chosen by existing research (sorting, BreadthFirst search, shortest path finding, etc.) usually align perfectly with a standard GNN architecture. This report describes how neural execution is applied to a complex algorithm, such as finding maximum bipartite matching by reducing it to a flow problem and using FordFulkerson to find the maximum flow. This is achieved via neural execution based only on features generated from a single GNN. The evaluation shows strongly generalising results with the network achieving optimal matching almost 100% of the time. 



(#20 / Sess. 1) Principal Neighbourhood Aggregation for Graph Nets
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graphstructured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous featureswhich occur regularly in realworld input domains and within the hidden layers of GNNsand we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degreescalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from realworld domains, all of which demonstrate the strength of our model. 



(#21 / Sess. 2) Graph Neural Networks for the Prediction of SubstrateSpecific Organic Reaction Conditions
(
Poster Teaser
)
>
SlidesLive Video We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions. To do so, we prepared a dataset collection of four ubiquitous reactions from the organic chemistry literature. We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions. We find that models are able to identify specific graph features that affect reaction conditions and lead to accurate predictions. The results herein show great promise in advancing molecular machine learning. 



(#23 / Sess. 2) A Note on OverSmoothing for Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (GNNs) have achieved a lot of success on graphstructured data. However, it is observed that the performance of graph neural networks does not improve as the number of layers increases. This effect, known as oversmoothing, has been analyzed mostly in linear cases. In this paper, we build upon previous results \cite{oono2019graph} to further analyze the oversmoothing effect in the general graph neural network architecture. We show when the weight matrix satisfies the conditions determined by the spectrum of augmented normalized Laplacian, the Dirichlet energy of embeddings will converge to zero, resulting in the loss of discriminative power. Using Dirichlet energy to measure ``expressiveness" of embedding is conceptually clean; it leads to simpler proofs than \cite{oono2019graph} and can handle more nonlinearities. 



(#24 / Sess. 2) DegreeQuant: QuantizationAware Training for Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video Graph neural networks have demonstrated strong performance modelling nonuniform structured data. However, there exists little research exploring methods to make them more efficient at inference time. In this work, we explore the viability of training quantized GNNs models, enabling the usage of low precision integer arithmetic for inference. We propose a method, DegreeQuant, to improve performance over existing quantizationaware training baselines commonly used on other architectures, such as CNNs. Our work demonstrates that it is possible to train models using 8bit integer arithmetic at inferencetime with similar accuracy to their full precision counterparts. 



(#26 / Sess. 2) Message Passing Query Embedding
(
Poster Teaser
)
>
SlidesLive Video Recent works on representation learning for Knowledge Graphs have moved beyond the problem of link prediction, to answering queries of an arbitrary structure. Existing methods are based on adhoc mechanisms that require training with a diverse set of query structures. We propose a more general architecture that employs a graph neural network to encode a graph representation of the query, where nodes correspond to entities and variables. The generality of our method allows it to encode a more diverse set of query types in comparison to previous work. Our method shows competitive performance against previous models for complex queries, and in contrast with these models, it can answer complex queries when trained for link prediction only. We show that the model learns entity embeddings that capture the notion of entity type without explicit supervision. 



(#27 / Sess. 2) Learning Graph Structure with A FiniteState Automaton Layer
(
Poster Teaser
)
>
SlidesLive Video Graphbased neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges). In practice, edges may represent intrinsic structure (e.g., abstract syntax trees of programs) or more abstract relations that aid reasoning (e.g., results of program analyses). In this work, we consider learning to derive abstract relations from intrinsic structure. Motivated by their power in program analyses, we consider relations defined by paths accepted by a finitestate automaton. We show how to learn these relations by transforming the problem into learning a policy on a graphbased POMDP with implicit differentiation. The result is a differentiable Graph FiniteState Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer performs well when either directly supervised or trained endtoend for a downstream task. 



(#28 / Sess. 1) Contrastive Graph Neural Network Explanation
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks achieve remarkable results on problems with structured data but come as blackbox predictors. Transferring existing techniques, for example occlusion, to interpret models fails as even removing a single node or edge can lead to drastic changes in the graph. The resulting graphs can differ from all training examples, causing model confusion and wrong explanations. Thus, we argue that explicability must use graphs consistent with the distribution underlying the training data. We coin this property Distribution Compliant Explanation (DCE) and present a novel Contrastive GNN Explanation (CoGE) technique following this paradigm. An experimental study supports the efficacy of CoGE. 



(#29 / Sess. 2) Fewshot link prediction via graph neural networks for Covid19 drugrepurposing
(
Poster Teaser
)
>
SlidesLive Video Predicting interactions among heterogenous graph structured data has numerous applications such as knowledge graph completion, recommendation systems and drug discovery. Often times, the links to be predicted belong to rare types such as the case in repurposing drugs for novel diseases. This motivates the task of fewshot link prediction. Typically, GCNs are illequipped in learning such rare link types since the relation embedding is not learned in an inductive fashion. This paper proposes an inductive RGCN for learning informative relation embeddings even in the fewshot learning regime. The proposed inductive model significantly outperforms the RGCN and stateoftheart KGE models in fewshot learning tasks. Furthermore, we apply our method on the drugrepurposing knowledge graph (DRKG) for discovering drugs for Covid19. We pose the drug discovery task as link prediction and learn embeddings for the biological entities that partake in the DRKG. Our initial results corroborate that several drugs used in clinical trials were identified as possible drug candidates. 



(#31 / Sess. 2) Generalized MultiRelational Graph Convolution Network
(
Poster Teaser
)
>
Graph Convolutional Networks (GCNs) have received increasing attention in recent machine learning. How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly optimizing the embeddings of both nodes and edges for targetdriven objectives. This paper addresses these limitations by proposing a novel framework, namely the GEneralized Multirelational Graph Convolutional Networks (GEMGCN), which combines the power of GCNs in graphbased belief propagation and the strengths of advanced knowledgebase embedding methods, and goes beyond. Our theoretical analysis shows that GEMGCN offers an elegant unification of several wellknown GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of GEMGCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification. 



(#101 / Sess. 1) Graph neural induction of value iteration
(
Poster Teaser
)
>
SlidesLive Video Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. gridworlds), and modelled the planning procedure only indirectly. We relax these constraints, proposing a graph neural network (GNN) that executes the value iteration (VI) algorithm, across arbitrary environment models, with direct supervision on the intermediate steps of VI. The results indicate that GNNs are able to model value iteration accurately, recovering favourable metrics and policies across a variety of outofdistribution tests. This suggests that GNN executors with strong supervision are a viable component within deep reinforcement learning systems. 



(#34 / Sess. 1) Set2Graph: Learning Graphs From Sets
(
Poster Teaser
)
>
SlidesLive Video Many problems in machine learning (ML) can be cast as learning functions from sets to graphs, or more generally to hypergraphs; in short, Set2Graph functions. Examples include clustering, learning vertex and edge features on graphs, and learning features on triplets in a collection. A natural approach for building Set2Graph models is to characterize all linear equivariant settohypergraph layers and stack them with nonlinear activations. This posses two challenges: (i) the expressive power of these networks is not well understood; and (ii) these models would suffer from high, often intractable computational and memory complexity, as their dimension grows exponentially. This paper advocates a family of neural network models for learning Set2Graph functions that is both practical and of maximal expressive power (universal), that is, can approximate arbitrary continuous Set2Graph functions over compact sets. Testing these models on different machine learning tasks, mainly an application to particle physics, we find them favorable to existing baselines. 



(#36 / Sess. 1) A Graph VAE and Graph Transformer Approach to Generating Molecular Graphs
(
Poster Teaser
)
>
SlidesLive Video We propose a combination of a variational autoencoder and a transformer based model which fully utilises graph convolutional and graph pooling layers to operate directly on graphs. The transformer model implements a novel node encoding layer, replacing the position encoding typically used in transformers, to create a transformer with no position information that operates on graphs, encoding adjacent node properties into the edge generation process. The proposed model builds on graph generative work operating on graphs with edge features, creating a model that offers improved scalability with the number of nodes in a graph. In addition, our model is capable of learning a disentangled, interpretable latent space that represents graph properties through a mapping between latent variables and graph properties. In experiments we chose a benchmark task of molecular generation, given the importance of both generated node and edge features. Using the QM9 dataset we demonstrate that our model performs strongly across the task of generating valid, unique and novel molecules. Finally, we demonstrate that the model is interpretable by generating molecules controlled by molecular properties, and we then analyse and visualise the learned latent representation. 



(#99 / Sess. 2) GraphNets with Spectral Message Passing
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (GNNs) are the subject of intense focus by the machine learning community for problems involving relational reasoning. GNNs can be broadly divided into spatial and spectral approaches. Spatial approaches use a form of learned messagepassing, in which interactions among vertices are computed locally, and information propagates over longer distances on the graph with greater numbers of messagepassing steps. Spectral approaches use eigendecompositions of the graph Laplacian to produce a generalization of spatial convolutions to graph structured data which access information over short and long time scales simultaneously. Here we introduce a Spectral Graph Network, which applies message passing to both the spatial and spectral domains. Our model projects vertices of the spatial graph onto the Laplacian eigenvectors, which are each represented as vertices in a fully connected ``spectral graph'', and then applies learned message passing to them. We apply this model to various benchmark tasks including a sparse graphversion of MNIST image classification, molecular classification (MoleculeNet), and molecular property prediction (QM9). The Spectral GN promotes efficient training, reaching high performance with fewer training iterations despite having more parameters. The model also provides robustness to edge dropout and outperforms baselines for the classification tasks. 



(#37 / Sess. 1) Geometric Matrix Completion: A Functional View
(
Poster Teaser
)
>
We propose a totally functional view of geometric matrix completion problem. Differently from existing work, we propose a novel regularization inspired from the functional map literature that is more interpretable and theoretically sound. On synthetic tasks with strong underlying geometric structure, our framework outperforms state of the art by a huge margin (two order of magnitude) demonstrating the potential of our approach. On real datasets, we achieve stateoftheart results at a fraction of the computational effort of previous methods. 



(#97 / Sess. 1) Clustered Dynamic Graph CNN for Biometric 3D Hand Shape Recognition
(
Poster Teaser
)
>
SlidesLive Video The research in biometric recognition using hand shape has been somewhat stagnating in the last decade. Meanwhile, computer vision and machine learning have experienced a paradigm shift with the renaissance of deep learning, which has set the new stateoftheart in many related fields. Inspired by successful applications of deep learning for other biometric modalities, we propose a novel approach to 3D hand shape recognition from RGBD data based on geometric deep learning techniques. We show how to train our model on synthetic data and retain the performance on real samples during test time. To evaluate our method, we provide a new dataset NNHand RGBD of short video sequences and show encouraging performance compared to diverse baselines on the new data, as well as current benchmark dataset HKPolyU. Moreover, the new dataset opens door to many new research directions in hand shape recognition. 



(#39 / Sess. 1) Hierarchically Attentive Graph Pooling with Subgraph Attention
(
Poster Teaser
)
>
SlidesLive Video Graph neural networks get significant attention for graph representation and classification in machine learning community. Different types of neighborhood aggregation and pooling strategies have been proposed in the literature. In this work, we introduce a higher order hierarchical GNN algorithm (SubGattPool) by employing (i) an attention mechanism which learns the importance and aggregates neighboring subgraphs of a node instead of firstorder neighbors, and (ii) a hierarchical pooling strategy which learns the importance of different hierarchies in a GNN. SubGattPool is able to achieve stateoftheart graph classification performance on multiple realworld datasets. 



(#96 / Sess. 1) Active Learning on Graphs via Meta Learning
(
Poster Teaser
)
>
SlidesLive Video Active learning (AL) for semisupervised node classification aims to reduce the number of labeled instances by selecting only the most informative nodes for labeling. The AL algorithms designed for other data types such as images and text do not perform well on graphstructured data. Although a few heuristicsbased AL algorithms have been proposed for graphs, a principled approach is lacking. We propose MetAL, an AL algorithm that selects unlabeled items that directly improve the future performance of a graph neural network (GNN) model. We formulate the AL problem as a bilevel optimization problem. Based on recent work in metalearning, we compute the metagradients to approximate the impact of unlabeled instances on the model uncertainty. We empirically demonstrate that MetAL outperforms existing AL algorithms. 



(#40 / Sess. 2) HNHN: Hypergraph Networks with Hyperedge Neurons
(
Poster Teaser
)
>
SlidesLive Video Hypergraphs provide a natural representation for many real world datasets. We propose a novel framework, HNHN, for hypergraph representation learning. HNHN is a hypergraph convolution network with nonlinear activation functions applied to both hypernodes and hyperedges, combined with a normalization scheme that can flexibly adjust the importance of highcardinality hyperedges and highdegree vertices depending on the dataset. We demonstrate improved performance of HNHN in both classification accuracy and speed on real world datasets when compared to state of the art methods. 



(#95 / Sess. 2) Graphein  a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures
(
Poster Teaser
)
>
SlidesLive Video Graphein is a python library for constructinggraph and surfacemesh representations ofprotein structures for computational analysis. The library interfaces with popular geometric deep learning libraries: DGL, PyTorch Geometric and PyTorch3D. Geometric deep learning is emerging as a popular methodology in computational structural biology. As feature engineering is a vital step in a machine learning project, the library is designed to be highly flexible, allowing the user to parameterise the graph construction, scalable to facilitate working with large protein complexes, and containing useful preprocessing tools for preparing experimental structure files. Graphein is also designed to facilitate networkbased and graphtheoretic analyses of protein structures in a highthroughput manner. As example workflows, we make available two new protein structurerelated datasets, previously unused by the geometric deep learning community. 



(#43 / Sess. 2) Uncovering the Folding Landscape of RNA Secondary Structure with Deep Graph Embeddings
(
Poster Teaser
)
>
SlidesLive Video Biomolecular graph analysis has recently gained much attention in the emerging field of geometric deep learning. Here we focus on organizing biomolecular graphs in ways that expose meaningful relations and variations between them. We propose a geometric scattering autoencoder (GSAE) network for learning such graph embeddings. Our embedding network first extracts rich graph features using the recently proposed geometric scattering transform. Then, it leverages a semisupervised variational autoencoder to extract a lowdimensional embedding that retains the information in these features that enable prediction of molecular properties as well as characterize graphs. We show that GSAE organizes RNA graphs both by structure and energy, accurately reflecting bistable RNA structures. Also, the model is generative and can sample new folding trajectories. 



(#94 / Sess. 2) Are Hyperbolic Representations in Graphs Created Equal?
(
Poster Teaser
)
>
SlidesLive Video Recently there was an increasing interest in applications of graph neural networks in nonEuclidean geometry; however, are nonEuclidean representations always useful for graph learning tasks? For different problems such as node classification and link prediction we compute hyperbolic embeddings and conclude that for tasks that require global prediction consistency it might be useful to use nonEuclidean embeddings, while for other tasks Euclidean models are superior. To do so we first fix an issue of the existing models associated with the optimization process at zero curvature. Current hyperbolic models deal with gradients at the origin in adhoc manner, which is inefficient and can lead to numerical instabilities. We solve the instabilities of kappaStereographic model at zero curvature cases and evaluate the approach of embedding graphs into the manifold in several graph representation learning tasks. 



(#45 / Sess. 1) Hierarchical InterMessage Passing for Learning on Molecular Graphs
(
Poster Teaser
)
>
SlidesLive Video We present a hierarchical neural message passing architecture for learning on molecular graphs. Our model takes in two complementary graph representations: the raw molecular graph representation and its associated junction tree, where nodes represent meaningful clusters in the original graph, e.g., rings or bridged compounds. We then proceed to learn a molecule's representation by passing messages inside each graph, and exchange messages between the two representations using a coarsetofine and finetocoarse information flow. Our method is able to overcome some of the restrictions known from classical GNNs, like detecting cycles, while still being very efficient to train. We validate its performance on the ZINC dataset and datasets stemming from the MoleculeNet benchmark collection. 



(#93 / Sess. 1) Geoopt: Riemannian Optimization in PyTorch
(
Poster Teaser
)
>
SlidesLive Video Geoopt is a researchoriented modular opensource package for Riemannian Optimization in PyTorch. The core of Geoopt is a standard Manifold interface that allows for the generic implementation of optimization algorithms. Geoopt supports basic Riemannian SGD as well as adaptive optimization algorithms. Geoopt also provides several algorithms and arithmetic methods for supported manifolds, which allow composing geometryaware neural network layers that can be integrated with existing models. 



(#46 / Sess. 2) Discrete Planning with Endtoend Trained Neuroalgorithmic Policies
(
Poster Teaser
)
>
SlidesLive Video Although modelbased and modelfree approaches to learning the control of systems have achieved impressive results on standard benchmarks, most have been shown to be lacking in their generalization capabilities. These methods usually require sampling an exhaustive amount of data from different environment configurations. We propose a hybrid policy architecture with a deep network and a shortest path planner working in unison. The model can be trained endtoend via blackboxdifferentiation. The deep network learns to predict timedependent waycosts such that internal plans match expert trajectories. These neuroalgorithmic policies generalize well to unseen environment configurations. 



(#92 / Sess. 1) From Graph LowRank Global Attention to 2FWL Approximation
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks are known to have an expressive power bounded by that of the vertex coloring algorithm \cite{xu2018how, morris2018weisfeiler}. However, for rich node features, such a bound does not exist and GNNs can be shown to be universal. Unfortunately, expressive power alone does not imply good generalization. We suggest the LowRank Global Attention (LRGA) module, taking advantage of the efficiency of low rank matrixvector multiplication, that improves the algorithmic alignment \cite{Xu2019algoalign} of GNNs with the more powerful 2folklore WeisfeilerLehman (FWL) algorithm. Furthermore, we provide a sample complexity bound for the module using kernel feature map interpretation of 2FWL. Empirically, augmenting existing GNN layers with LRGA produces state of the art results on most datasets in a GNN standard benchmark. 



(#90 / Sess. 1) Pointer Graph Networks
(
Poster Teaser
)
>
SlidesLive Video Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. We introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model expressivity. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. Despite its sparsity, this adaptable graph structure proves sufficiently expressive to simulate complex algorithms. The pointing mechanism is supervised to model longterm sequences of operations on classical data structures. PGNs can learn parallelisable variants of pointerbased data structures, and generalise outofdistribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets. 



(#50 / Sess. 1) Graph Convolutional Gaussian Processes for Link Prediction
(
Poster Teaser
)
>
SlidesLive Video Link prediction aims to reveal missing edges in a graph. We address this task with a deep graph convolutional Gaussian process model. The Gaussian process is transformed using simplified graph convolutions to better leverage the topological information of the graph domain. To scale the Gaussian process model to larger graphs, we introduce a variational inducing point method that places pseudoinputs on a graphstructured domain. The proposed model represents the first Gaussian process for link prediction that can make use of both node features and topological information. We evaluate our model on three graph data sets with up to thousands of nodes and report consistent improvements over existing Gaussian process models and stateoftheart graph neural network approaches. 



(#89 / Sess. 1) Graphs, Entities, and Step Mixture
(
Poster Teaser
)
>
SlidesLive Video Existing approaches for graph neural networks commonly suffer from the oversmoothing issue, regardless of how neighborhoods are aggregated. Most methods also focus on transductive scenarios for fixed graphs, leading to poor generalization for unseen graphs. To address these issues, we propose a new graph neural network that considers both edgebased neighborhood relationships and nodebased entity features, i.e.Graph Entities with Step Mixture via random walk (GESM). GESM employs a mixture of various steps through random walk to alleviate the oversmoothing problem, attention to dynamically reflect interrelations depending on node information, and structurebased regularization to enhance embedding representation. With intensive experiments, we show that the proposed GESM achieves stateoftheart or comparable performances on eight benchmark graph datasets. 



(#51 / Sess. 1) Deep Lagrangian Propagation in Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (Scarselli et al., 2009) exploit an iterative diffusion procedure to compute the node states as the fixed point of the trainable state transition function. In this paper, we show how to cast this scheme as a constrained optimization problem, thus avoiding the unfolding procedure required for the computation of the fixed point. This is done by searching for saddle points of the Lagrangian function in the space of the weights, state variables and Lagrange multipliers. The proposed approach shows stateoftheart performance in multiple standard benchmarks in graph domains. 



(#87 / Sess. 2) BiLevel Attention Neural Architectures for Relational Data
(
Poster Teaser
)
>
SlidesLive Video We present BiLevel AttentionBased Relational Graph Convolutional Networks (BRGCN), unique neural network architectures that utilize masked selfattentional layers with relational graph convolutions, to effectively operate on highly multirelational data. BRGCN models use bilevel attention to learn node embeddings through (1) nodelevel attention, and (2) relationlevel attention. BRGCN’s nodelevel selfattentional layers use intrarelational graph interactions to learn relationspecific node embeddings using a weighted aggregation of neighborhood features in a sparse subgraph region. BRGCN's relationlevel selfattentional layers use interrelational graph interactions to learn the final node embeddings using a weighted aggregation of relationspecific node embeddings. BRGCN's bilevel attention mechanism extends Transformerbased multiplicative attention from the natural language processing (NLP) domain, and Graph Attention Networks (GAT)based attention, to largescale heterogeneous graphs (HGs). On node classification, BRGCN outperforms baselines from 0.29% to 14.95%, and on the link prediction task, outperforms baselines from 0.02% to 7.40% and suggests to enrich HG embedding models. We also conduct ablation studies to evaluate the quality of BRGCN's relationlevel attention and discuss how its learning of graph structure may be transferred to enrich other Graph Neural Networks (GNNs). Through various experiments, we show that BRGCN's attention mechanism is both scalable and more effective in learning compared to stateoftheart GNNs. 



(#53 / Sess. 1) Scene Graph Reasoning for Visual Question Answering
(
Poster Teaser
)
>
SlidesLive Video Visual question answering is concerned with answering freeform questions about an image. Since it requires a deep linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires techniques from both computer vision and natural language processing. We propose a novel method that approaches the task by performing contextdriven, sequential reasoning based on the objects and their semantic and spatial relationships present in the scene. As a first step, we derive a scene graph which describes the objects in the image, as well as their attributes and their mutual relationships. A reinforcement agent then learns to autonomously navigate over the extracted scene graph to generate paths, which are then the basis for deriving answers. We conduct a first experimental study on the challenging GQA dataset with manually curated scene graphs, where our method reaches the level of human performance. 



(#86 / Sess. 2) Graph Generation with EnergyBased Models
(
Poster Teaser
)
>
SlidesLive Video We present a set of novel, energybased models built on top of graph neural networks (GNNEBMs) to estimate the unnormalized density of a distribution of graphs. GNNEBMs can generate graphs implicitly via MCMC sampling. We compare the performance of GNNEBMs trained using 3 different estimators: pseudolikelihood, conditional noise contrastive estimation, and persistent contrastive divergence (PCD). We find that all 3 estimators result in models that generalize well, while models trained with PCD generate samples that are competitive with stateoftheart baselines. Finally, we discuss the potential of GNNEBMs beyond generation for diverse tasks such as semisupervised learning and outlier detection. 



(#54 / Sess. 2) SE(3)Transformers: 3D RotoTranslation Equivariant Attention Networks
(
Poster Teaser
)
>
SlidesLive Video We introduce the SE(3)Transformer, a variant of the selfattention module for 3D point clouds, which is equivariant under continuous 3D rototranslations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weighttying within the model, leading to fewer trainable parameters and thus decreased sample complexity (i.e. we need less training data). The SE(3)Transformer leverages the benefits of selfattention to operate on large point clouds with varying number of points while guaranteeing SE(3)equivariance for robustness. We achieve competitive performance on two realworld datasets, ScanObjectNN and QM9. 



(#84 / Sess. 2) UniKER: A Unified Framework for Combining Embedding and Horn Rules for Knowledge Graph Inference
(
Poster Teaser
)
>
SlidesLive Video Combine KGE and logical rules for better KG inference have gained increasing attention in recent years. Unfortunately, a majority of existing employ sampling strategies to randomly select only a small portion of ground rules or hidden triples, thus only partially leverage the power of logical rules in reasoning. In this paper, we propose a novel framework UniKER to address this challenge by restricting logical rules to be Horn rules, which can fully exploit the knowledge in logical rules and enable the mutual enhancement of logical rulebased reasoning and KGE in an extremely efficient way. Extensive experiments have demonstrated that our approach is superior to existing stateoftheart algorithms in terms of both efficiency and effectiveness. 



(#83 / Sess. 2) Connecting Graph Convolutional Networks and GraphRegularized PCA
(
Poster Teaser
)
>
Graph convolution operator of the GCN model is originally motivated from a localized firstorder approximation of spectral graph convolutions. This work stands on a different view; establishing a connection between graph convolution and graphregularized PCA. Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relationship with stacking graphregularized PCA (GPCA). We empirically demonstrate that the unsupervised embeddings by GPCA paired with a logistic regression classifier achieves similar performance to GCN on semisupervised node classification tasks. Further, we capitalize on the discovered relationship to design an effective initialization strategy for GCN based on stacking GPCA. 



(#55 / Sess. 1) Uncertainty in Neural Relational Inference Trajectory Reconstruction
(
Poster Teaser
)
>
SlidesLive Video Neural networks used for multiinteraction trajectory reconstruction lack the ability to estimate the uncertainty in their outputs, which would be useful to better analyse and understand the systems they model. In this paper we extend the Factorised Neural Relational Inference model to output both a mean and a standard deviation for each component of the phase space vector, which together with an appropriate loss function, can account for uncertainty. A variety of loss functions are investigated including ideas from convexification and a Bayesian treatment of the problem. We show that the physical meaning of the variables is important when considering the uncertainty and demonstrate the existence of pathological local minima that are difficult to avoid during training. 



(#82 / Sess. 2) Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks
(
Poster Teaser
)
>
SlidesLive Video Graph convolutional networks (GCNs) are widely used for semisupervised node classification on graphs today. The graph structure is however only accounted for by considering the similarity of activations between adjacent nodes, in turn degrading the results. In this work, we augment GCN models by incorporating richer notions of regularity by leveraging cascades of bandpass filters, known as geometric scatterings. We introduce a new hybrid architecture for the task and demonstrate its potential on multiple graph datasets, where it outperforms leading GCN models. 



(#58 / Sess. 1) Temporal Graph Networks for Deep Learning on Dynamic Graphs
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (GNNs) have become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs. Thanks to a novel combination of memory modules and graphbased operators, TGNs are able to significantly outperform previous approaches, being at the same time more computationally efficient. 



(#80 / Sess. 2) Weisfeiler and Leman go sparse: Towards scalable higherorder graph embeddings
(
Poster Teaser
)
>
SlidesLive Video The 1dimensional WeisfeilerLeman (WL) algorithm has recently emerged as a powerful tool for analysis and design of kernels and neural architectures for (supervised) learning with graphs. However, due to the purely local nature of the algorithms, they might miss essential patterns. Here, the kdimensional WL accounts for the higherorder interactions between vertices by defining a suitable notion of adjacency between ktuples of vertices. However, it does not scale and may suffer from overfitting when used in a machine learning setting. We circumvent these issues by proposing new local variants of the kWL and corresponding neural architectures, which consider a subset of the original neighborhood. The expressive power of (one of) our algorithms is strictly higher than the original kWL with no losses in scalability. In fact, for sparse graphs, the scalability improves by several orders of magnitude. The kernel version establishes a new stateoftheart for graph classification on a wide range of benchmark datasets, while the neural version shows promising performance on largescale molecular regression tasks. 



(#62 / Sess. 2) Relate and Predict: StructureAware Prediction with Jointly Optimized Neural Dependency Graph
(
Poster Teaser
)
>
SlidesLive Video Understanding relationships between feature variables is one important way humans use to make decisions. However, stateoftheart deep learning studies either focus on taskagnostic statistical dependency learning or do not model explicit feature dependencies during prediction. We propose a deep neural network framework, dGAP, to learn neural dependency Graph and optimize structureAware target Prediction simultaneously. dGAP trains towards a structure selfsupervision loss and a target prediction loss jointly. Our method leads to an interpretable model that can disentangle sparse feature relationships, informing the user how relevant dependencies impact the target task. We empirically evaluate dGAP on multiple simulated and real datasets. dGAP is not only accurate, but can also recover correct dependency structure. 



(#79 / Sess. 1) TUDataset: A collection of benchmark datasets for learning with graphs
(
Poster Teaser
)
>
SlidesLive Video Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Pythonbased data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at www.graphlearning.io. The experiments are fully reproducible from the code available at www.github.com/chrsmrrs/tudataset. 



(#63 / Sess. 2) Stay Positive: Knowledge Graph Embedding Without Negative Sampling
(
Poster Teaser
)
>
SlidesLive Video Knowledge graphs (KGs) are typically incomplete and we often wish to infer new facts given the existing ones. This can be thought of as a binary classification problem; we aim to predict if new facts are true or false. Unfortunately, we generally only have positive examples (the known facts) but we also need negative ones to train a classifier. To resolve this, it is usual to generate negative examples using a negative sampling strategy. However, this can produce false negatives which may reduce performance, is computationally expensive, and does not produce calibrated classification probabilities. In this paper, we propose a training procedure that obviates the need for negative sampling by adding a novel regularization term to the loss function. Our results for two relational embedding models (DistMult and SimplE) show the merit of our proposal both in terms of performance and speed. 



(#77 / Sess. 1) SIGN: Scalable Inception Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video The popularity of graph neural networks has sparked interest, both in academia and in industry, in developing methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture, which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motifinduced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct extensive experimental evaluation on various open benchmarks and show that our approach is competitive with other stateoftheart architectures, while requiring a fraction of training and inference time. 



(#76 / Sess. 1) Graph Clustering with Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (GNNs) have achieved stateoftheart results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. In this paper, we study unsupervised training of GNN pooling in terms of their clustering capabilities. We draw a connection between graph clustering and graph pooling: intuitively, a good graph clustering is what one would expect from a GNN pooling layer. Counterintuitively, we show that this is not true for stateoftheart pooling methods, such as MinCut pooling. To address these deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method inspired by the modularity measure of clustering quality, and show how it tackles recovery of the challenging clustering structure of graphs. In order to clarify the regimes where existing methods fail, we carefully design a set of experiments on synthetic data which show that DMoN is able to jointly leverage the signal from the graph structure and node features. Similarly, on realworld data, we show that DMoN produces high quality clusters which correlate strongly with ground truth labels, achieving stateoftheart results. 



(#64 / Sess. 1) Differentiable Graph Module (DGM) for Graph Convolutional Networks
(
Poster Teaser
)
>
SlidesLive Video Graph deep learning has recently emerged as a powerful ML concept allowing to generalize successful deep neural architectures to nonEuclidean structured data. Such methods have shown promising results on a broad spectrum of applications ranging from social science, biomedicine, and particle physics to computer vision, graphics, and chemistry. One of the limitations of the majority of the current graph neural network architectures is that they are often restricted to the transductive setting and rely on the assumption that the underlying graph is known and fixed. In many settings, such as those arising in medical and healthcare applications, this assumption is not necessarily true since the graph may be noisy, partially or even completely unknown, and one is thus interested in inferring it from the data. This is especially important in inductive settings when dealing with nodes not present in the graph at training time. In this paper, we introduce Differentiable Graph Module (DGM), a learnable function predicting the edge probability in the graph relevant for the task, that can be combined with convolutional graph neural network layers and trained in an endtoend fashion. We provide an extensive evaluation of applications from the domains of healthcare (disease prediction) and brain imaging (age prediction). We show that our model provides a significant improvement over baselines both in transductive and inductive settings and achieves stateoftheart results. 



(#75 / Sess. 1) Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting
(
Poster Teaser
)
>
SlidesLive Video Graph Neural Networks (GNNs) have been recently found to suffer from important limitations regarding their ability to capture the structure of the underlying graph. It has been shown that the expressive power of standard GNNs is bounded by the WeisfeilerLehman (WL) graph isomorphism test, from which they inherit proven limitations such as the inability to detect and count graph substructures. On the other hand, there is significant empirical evidence that substructures are often informative for downstream tasks, suggesting that it is desirable to design GNNs capable of leveraging this important source of information. To this end, we propose a novel topologicallyaware message passing scheme based on subgraph isomorphism counting. We show that our architecture allows incorporating domainspecific inductive biases and that it is strictly more expressive than the WL test, while being able to disambiguate even hard instances of graph isomorphism. In contrast to recent works, our approach does not attempt to adhere to the WL hierarchy and therefore retains the locality and linear complexity of standard GNNs. 



(#65 / Sess. 2) Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs
(
Poster Teaser
)
>
SlidesLive Video We present a multirelational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest social coding platforms. Such representation enables posing many useractivity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated timeconditioned link prediction and ii) extrapolated timeconditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events. 



(#66 / Sess. 2) BiLevel Graph Neural Networks for DrugDrug Interaction Prediction
(
Poster Teaser
)
>
SlidesLive Video We introduce BiGNN for modeling biological link prediction tasks such as drugdrug interaction (DDI) and proteinprotein interaction (PPI). Taking drugdrug interaction as an example, existing methods using machine learning either only utilize the link structure between drugs without using the graph representation of each drug molecule, or only leverage the individual drug compound structures without using graph structure for the higherlevel DDI graph. The key idea of our method is to fundamentally view the data as a bilevel graph, where the highest level is a representing the interaction between biological entities (interaction graph), and each biological entity itself is further expanded to its intrinsic graph representation (representation graphs), where the graph is either flat like a drug compound or hierarchical like a protein with amino acid level graph, secondary structure, tertiary structure, etc. Our model not only allows the usage of information from both the highlevel interaction graph and the lowlevel representation graphs, but also offers a baseline for future research opportunities to address the bilevel nature of the data. 



(#74 / Sess. 2) RelationDependent Sampling for MultiRelational Link Prediction
(
Poster Teaser
)
>
SlidesLive Video Multirelational graphs contain various types of relations that usually come with variable frequency and have different importance for the problem at hand. Existing graph sampling approaches ignore the multirelational nature of such graphs. We propose an approach to modeling the importance of relation types for sampling and show that we can learn the right balance: relationtype probabilities that reflect both frequency and importance. We use relationdependent sampling to develop a scalable graph neural network and apply it for multirelational link prediction. Our experiments specifically consider drugdrug interaction (DDI) prediction, an important task during drug development. In that context, we further show the benefit of considering a special relation type, negative edges. Synergistic drug combinations (e.g., drugs that synergize by improved efficacy and reduced side effects) can be regarded as negative evidence for DDIs (which often indicate adverse reactions). We add drug synergy data to provide extra expert knowledge which can be easily integrated into our model and yields improved performance. Teaser video  Veronika Thost · Arthur Feeney · Rishabh Gupta 🔗 



(#67 / Sess. 2) Continuous Graph Flow
(
Poster Teaser
)
>
SlidesLive Video In this paper, we propose Continuous Graph Flow, a generative continuous flow based method that aims to model complex distributions of graphstructured data. Our proposed model learns a joint probability density over a set of related random variables by formulating it as first order ordinary differential equation system with shared and reusable functions that operate over the graph structure. This leads to a reversible continuous message passing over time resulting in continuous transformations of probability distributions of the variables. We evaluate our model on a diverse set of generation tasks: graph generation, image puzzle generation, and layout generation from scene graphs. Experimental results show that CGFbased models outperform stateoftheart graph generative models. 



(#70 / Sess. 2) Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits
(
Poster Teaser
)
>
SlidesLive Video One of the key challenges in automated chemical synthesis planning is to propose diverse and reliable reactions. A common approach is to generate reactions using reaction templates, which represent a reaction as a fixed graph transformation. This enables accurate and interpretable predictions but can suffer from limited diversity. On the other hand, templatefree methods increase diversity but can be prone to making trivial mistakes. Inspired by the efficacy of reaction templates, we propose Molecule Edit Graph Attention Network (MEGAN), a templatefree model that encodes reaction as a sequence of graph edits. Our model achieves stateoftheart results on a standard retrosynthesis benchmark without any manual rule encoding. 



(#73 / Sess. 2) Evaluating Logical Generalization in Graph Neural Networks
(
Poster Teaser
)
>
SlidesLive Video Recent research has highlighted the role of relational inductive biases in building learning agents that can generalize and reason in a compositional manner. However, while relational learning algorithms such as graph neural networks (GNNs) show promise, we do not understand how effectively these approaches can adapt to new tasks. In this work, we study the task of logical generalization using GNNs by designing a benchmark suite grounded in firstorder logic. Our benchmark suite, GraphLog, requires that learning algorithms perform rule induction in different synthetic logics, represented as knowledge graphs. GraphLog consists of relation prediction tasks on 57 distinct logical domains. We use GraphLog to evaluate GNNs in three different setups: singletask supervised learning, multitask pretraining, and continual learning. Unlike previous benchmarks, our approach allows us to precisely control the logical relationship between the different tasks. We find that the ability for models to generalize and adapt is strongly determined by the diversity of the logical rules they encounter during training, and our results highlight new challenges for the design of GNN models. 



(#71 / Sess. 1) Population Graph GNNs for Brain Age Prediction
(
Poster Teaser
)
>
SlidesLive Video Many common neurological and neurodegenerative disorders, such as Alzheimer’s disease, dementia and multiple sclerosis, have been associated with abnormal patterns of apparent ageing of the brain. Discrepancies between the estimated brain age and the actual chronological age (brain age gaps) can be used to understand the biological pathways behind the ageing process, assess an individual’s risk for various brain disorders and identify new personalised treatment strategies. By flexibly integrating minimally preprocessed neuroimaging and nonimaging modalities into a population graph data structure, we train two types of graph neural network (GNN) architectures to predict brain age in a clinically relevant fashion as well as investigate their robustness to noisy inputs and graph sparsity. The population graph approach has the potential to learn from the entire cohort of healthy and affected subjects of both sexes at once, capturing a wide range of confounding effects and detecting variations in brain age trends between different subpopulations of subjects. 