Machine learning (ML) has revolutionized a wide array of scientific disciplines, including chemistry, biology, physics, material science, neuroscience, earth science, cosmology, electronics, mechanical science. It has solved scientific challenges that were never solved before, e.g., predicting 3D protein structure, imaging black holes, automating drug discovery, and so on. Despite this promise, several critical gaps stifle algorithmic and scientific innovation in AI for Science: (1) Underexplored theoretical analysis, (2) Unrealistic methodological assumptions or directions, (3) Overlooked scientific questions, (4) Limited exploration at the intersections of multiple disciplines, (5) Science of science, (6) Responsible use and development of AI for science. However, very little work has been done to bridge these gaps, mainly because of the missing link between distinct scientific communities. While many workshops focus on AI for specific scientific disciplines, they are all concerned with the methodological advances within a single discipline (e.g., biology) and are thus unable to examine the crucial questions mentioned above. This workshop will fulfill this unmet need and facilitate community building; with hundreds of ML researchers beginning projects in this area, the workshop will bring them together to consolidate the fast growing area of AI for Science into a recognized field.
Sun 6:00 a.m.  6:10 a.m.

Opening Remarks
SlidesLive Video » 
🔗 
Sun 6:10 a.m.  6:50 a.m.

Frank Noe
(
Talk
)
SlidesLive Video » 
🔗 
Sun 6:50 a.m.  7:30 a.m.

Rafael GomezBombarelli
(
Talk
)
SlidesLive Video » 
🔗 
Sun 7:45 a.m.  8:00 a.m.

Efficient Continuous SpatioTemporal Simulation with Graph Spline Networks, Chuanbo Hua
(
Contributed Talk
)

🔗 
Sun 8:00 a.m.  8:40 a.m.

Daphne Koller
(
Talk
)
SlidesLive Video » 
🔗 
Sun 8:40 a.m.  10:00 a.m.

Poster Session
(
Poster
)

🔗 
Sun 10:00 a.m.  10:40 a.m.

Animashree Anandkumar
(
Talk
)
SlidesLive Video » 
🔗 
Sun 10:40 a.m.  10:55 a.m.

Learning to solve PDE constraint inverse problem using Graph Network, Qingqing Zhao
(
Contributed Talk
)
SlidesLive Video » 
🔗 
Sun 10:55 a.m.  11:35 a.m.

Anthony Gitter
(
Talk
)
SlidesLive Video » 
🔗 
Sun 11:55 a.m.  12:35 p.m.

Jiequn Han
(
Talk
)
SlidesLive Video » 
🔗 
Sun 12:35 p.m.  12:50 p.m.

Understanding the evolution of tumours using hybrid deep generative models, Tom Ouellette
(
Contributed Talk
)

🔗 
Sun 12:50 p.m.  1:30 p.m.

Carla P. Gomes
(
Talk
)
SlidesLive Video » 
🔗 
Sun 1:50 p.m.  2:05 p.m.

A Density Functional Recommendation Approach for Accurate Predictions of Vertical Spin Splitting of Transition Metal Complexes, Chenru Duan
(
Contributed Talk
)
SlidesLive Video » 
🔗 
Sun 2:05 p.m.  2:45 p.m.

Max Tegmark
(
Talk
)
SlidesLive Video » 
🔗 
Sun 2:45 p.m.  3:00 p.m.

Closing Remarks
SlidesLive Video » 
🔗 


Efficient Continuous SpatioTemporal Simulation with Graph Spline Networks
(
Poster
)
link »
Complex simulation of physical systems is an invaluable tool for a large number of fields, including engineering and scientific computing. To overcome the computational requirements of highaccuracy solvers, learned graph neural network simulators have recently been introduced. However, these methods often require a large number of nodes and edges, which can hinder their performance. Moreover, they cannot evaluate continuous solutions in space and time due to their inherently discretized structure. In this paper, we propose GraphSplineNets, a method based on graph neural networks and orthogonal spline collocation (OSC) to accelerate learned simulations of physical systems by interpolating solutions of graph neural networks. First, we employ an encoderdecoder message passing graph neural network to map the location and value of nodes from the physical domain to hidden space and learn to predict future values. Then, to realize fully continuous simulations over the domain without dense sampling of nodes, we postprocess predictions with OSC. This strategy allows us to produce a solution at any location in space and time without explicit prior knowledge of underlying differential equations and with a lower computational burden compared to learned graph simulators evaluating more spacetime locations. We evaluate the performance of our approach in heat equation, dam breaking, and flag simulations with different graph neural network baselines. Our method shows is consistently Pareto efficient in terms of simulation accuracy and inference time, i.e. 3x speedup with 10% less error on flag simulation. 
Chuanbo HUA · Federico Berto · Michael Poli · Stefano Massaroli · Jinkyoo Park 🔗 


Efficient Continuous SpatioTemporal Simulation with Graph Spline Networks
(
Oral
)
link »
SlidesLive Video » Complex simulation of physical systems is an invaluable tool for a large number of fields, including engineering and scientific computing. To overcome the computational requirements of highaccuracy solvers, learned graph neural network simulators have recently been introduced. However, these methods often require a large number of nodes and edges, which can hinder their performance. Moreover, they cannot evaluate continuous solutions in space and time due to their inherently discretized structure. In this paper, we propose GraphSplineNets, a method based on graph neural networks and orthogonal spline collocation (OSC) to accelerate learned simulations of physical systems by interpolating solutions of graph neural networks. First, we employ an encoderdecoder message passing graph neural network to map the location and value of nodes from the physical domain to hidden space and learn to predict future values. Then, to realize fully continuous simulations over the domain without dense sampling of nodes, we postprocess predictions with OSC. This strategy allows us to produce a solution at any location in space and time without explicit prior knowledge of underlying differential equations and with a lower computational burden compared to learned graph simulators evaluating more spacetime locations. We evaluate the performance of our approach in heat equation, dam breaking, and flag simulations with different graph neural network baselines. Our method shows is consistently Pareto efficient in terms of simulation accuracy and inference time, i.e. 3x speedup with 10% less error on flag simulation. 
Chuanbo HUA · Federico Berto · Michael Poli · Stefano Massaroli · Jinkyoo Park 🔗 


LinkBERT: Language Model Pretraining with Document Link Knowledge
(
Poster
)
link »
SlidesLive Video » Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks. However, existing methods such as BERT model a single document, and do not capture dependencies or knowledge that span across documents. In this work, we propose LinkBERT, an LM pretraining method that leverages links between documents, e.g., hyperlinks, citation links. Given a text corpus, we view it as a graph of documents and create LM inputs by placing linked documents in the same context. We then pretrain the LM with two joint selfsupervised objectives: masked language modeling and our new proposal, document relation prediction. We show that LinkBERT outperforms BERT on diverse downstream tasks across both general domain (pretrained on Wikipedia with hyperlinks) and biomedical domain (pretrained on PubMed with citation links). In particular, LinkBERT is effective for knowledge and reasoningintensive tasks such as multihop reasoning and fewshot inference (+7\% absolute gain on BioASQ and MedQA), and achieves new stateoftheart results on various biomedical NLP tasks including relation extraction and literature classification. Our results suggest the promise of LinkBERT for scientific applications. 
Michihiro Yasunaga · Jure Leskovec · Percy Liang 🔗 


Graphein  a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks
(
Poster
)
link »
Geometric deep learning has broad applications in biology, a domain where relational structure in data is often intrinsic to modelling the underlying phenomena. Currently, efforts in both geometric deep learning and, more broadly, deep learning applied to biomolecular tasks have been hampered by a scarcity of appropriate datasets accessible to domain specialists and machine learning researchers alike. To address this, we introduce Graphein as a turnkey tool for transforming raw data from widelyused bioinformatics databases into machine learningready datasets in a highthroughput and flexible manner. Graphein is a Python library for constructing graph and surfacemesh representations of biomolecular structures, such as proteins, nucleic acids and small molecules, and biological interaction networks for computational analysis and machine learning. Graphein provides utilities for data retrieval from widelyused bioinformatics databases for structural data, including the Protein Data Bank, the AlphaFold Structure Database, chemical data from ZINC and ChEMBL, and for biomolecular interaction networks from STRINGdb, BioGrid, TRRUST and RegNetwork. The library interfaces with popular geometric deep learning libraries: DGL, Jraph, PyTorch Geometric and PyTorch3D though remains framework agnostic as it is built on top of the PyData ecosystem to enable interoperability with scientific computing tools and libraries. Graphein is designed to be highly flexible, allowing the user to specify each step of the data preparation, scalable to facilitate working with large protein complexes and interaction graphs, and contains useful preprocessing tools for preparing experimental files. Graphein facilitates networkbased, graphtheoretic and topological analyses of structural and interaction datasets in a highthroughput manner. We envision that Graphein will facilitate developments in computational biology, graph representation learning and drug discovery. \Availability and implementation: Graphein is written in Python. Source code, example usage and tutorials, datasets, and documentation are made freely available under the MIT License at the following URL: https://anonymous.4open.science/r/graphein3472/README.md 
Arian Jamasb · Ramon ViÃ±as TornÃ© · Eric Ma · Yuanqi Du · Charles Harris · Kexin Huang · Dominic Hall · Pietro LiÃ³ · Tom Blundell 🔗 


Understanding the evolution of tumours using hybrid deep generative models
(
Poster
)
link »
Understanding both the population or subclonal structure and evolutionary forces that drive tumour evolution has important clinical implications for patients. However, deconvoluting subclonal structure and performing evolutionary parameter inference have largely been treated as two independent or stepwise tasks. Here, we show that combining stochastic simulations with hybrid deep generative models enables joint inference of subclonal structure and evolutionary parameter estimates. Ultimately, by jointly learning these two tasks, we show that our proposed approach leads to improve performance across a multitude of cancer evolution tasks including, but not limited to, detecting subclones, quantifying subclone frequency, and estimating mutation rate. As an additional benefit, we also show that hybrid deep generative models also provide substantial reductions in inference time relative to existing methods. 
Tom Ouellette · Philip Awadalla 🔗 


Understanding the evolution of tumours using hybrid deep generative models
(
Oral
)
link »
SlidesLive Video » Understanding both the population or subclonal structure and evolutionary forces that drive tumour evolution has important clinical implications for patients. However, deconvoluting subclonal structure and performing evolutionary parameter inference have largely been treated as two independent or stepwise tasks. Here, we show that combining stochastic simulations with hybrid deep generative models enables joint inference of subclonal structure and evolutionary parameter estimates. Ultimately, by jointly learning these two tasks, we show that our proposed approach leads to improve performance across a multitude of cancer evolution tasks including, but not limited to, detecting subclones, quantifying subclone frequency, and estimating mutation rate. As an additional benefit, we also show that hybrid deep generative models also provide substantial reductions in inference time relative to existing methods. 
Tom Ouellette · Philip Awadalla 🔗 


On the Relationships between Graph Neural Networks for the Simulation of Physical Systems and Classical Numerical Methods
(
Poster
)
link »
SlidesLive Video » Recent developments in Machine Learning approaches for modelling physical systems have begun to mirror the past development of numerical methods in the computational sciences. In this survey we begin by providing an example of this development with the parallels between graph neural network acceleration for physical simulations and the development of particlebased approaches. We then give an overview of simulation approaches, which have not yet found their way into stateoftheart Machine Learning methods and hold the potential to Machine Learning approaches more accurate and more efficient. We conclude by presenting an outlook on the potential of these approaches for making Machine Learning models for science more efficient. 
Artur Toshev · Ludger Paehler · Andrea Panizza · Nikolaus Adams 🔗 


Intelligent Digital Twins can Accelerate Scientific Discovery and Control Complex MultiPhysics Processes
(
Poster
)
link »
SlidesLive Video » The emerging area of Intelligent Digital Twins (IDTs) offers great potential as a new paradigm for accelerating scientific discovery, while also offering stateoftheart functionality in controlling complex physical processes. We investigate this concept for the case of an Intelligent Digital Twin of metal additive manufacturing (AM). Metal AM is an excellent choice for utilising of an IDT due to the process being an inherently complex multiphysics one, with key elements including granular powder flow, laser melting and material solidification. This complexity means that computational simulations are extremely costly and obtaining high quality experimental data extremely difficult, so optimal exploration of the parameter space using all available information on the current uncertainty in the region of interest is highly desirable. Our Intelligent Digital Twin for this process includes a complete description of the target geometry of the object being printed and a set of datadriven and computational models for the different physical processes occurring in the system. The datadriven models consist of a set of Gaussian Processes (GP) that can be trained using combinations of real world sensor data and outputs from computational simulations. We illustrate the utility of our IDT by determining optimal input print parameters and obtaining Pareto fronts between competing priorities such as surface roughness and print time. We also demonstrate the potential of the IDT as an intelligent control system to respond to errors during the print process and dynamically improve final print quality. 
Arden Phua · Gary Delaney · Peter S. Cook · Chris Davies 🔗 


Reinforced Genetic Algorithm for Structurebased Drug Design
(
Poster
)
link »
Structurebased drug design (SBDD) aims to discover drug candidates by finding molecules (ligands) that bind tightly to a diseaserelated protein (targets), which is the primary approach to computeraided drug discovery.Recently, applying deep generative models for threedimensional (3D) molecular design conditioned on protein pockets to solve SBDD has attracted much attention, but their formulation as probabilistic modeling often leads to unsatisfactory optimization performance. On the other hand, traditional combinatorial optimization methods such as genetic algorithms (GA) have demonstrated stateoftheart performance in various molecular optimization tasks. However, they do not utilize protein target structure to inform design steps but rely on a randomwalklike exploration, which leads to unstable performance and no knowledge transfer between different tasks despite the similar binding physics.To achieve a more stable and efficient SBDD, we propose Reinforced Genetic Algorithm that uses neural models to prioritize the profitable design steps and suppress randomwalk behavior. The neural models take the 3D structure of the targets and ligands as inputs and are pretrained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then finetuned during optimization. We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that Reinforced Genetic Algorithm outperforms the baselines in terms of docking scores and is more robust to random initializations. The ablation study also indicates that the training on different targets helps improve the performance by leveraging the shared underlying physics of the binding processes. 
Tianfan Fu · Wenhao Gao · Connor Coley · Jimeng Sun 🔗 


Improving Subgraph Representation Learning via MultiView Augmentation
(
Poster
)
link »
SlidesLive Video » Subgraph representation learning based on Graph Neural Network (GNN) has broad applications in chemistry and biology, such as molecule property prediction and gene collaborative function prediction. On the other hand, graph augmentation techniques have shown promising results in improving graphbased and nodebased classification tasks but are rarely explored in the GNNbased subgraph representation learning literature. In this work, we developed a novel multiview augmentation mechanism to improve subgraph representation learning and thus the accuracy of downstream prediction tasks. The augmentation technique creates multiple variants of subgraphs and embeds these variants into the original graph to achieve both high training efficiency, scalability, and improved accuracy. Experiments on several realworld subgraph benchmarks demonstrate the superiority of our proposed multiview augmentation techniques. 
Yili Shen · Jiaxu Yan · ChengWei Ju · Jun Yi · Zhou Lin · Hui Guan 🔗 


Path Integral Stochastic Optimal Control for Sampling Transition Paths
(
Poster
)
link »
We consider the problem of Sampling Transition Paths. Given two metastable conformational states of a molecular system, \eg\ a folded and unfolded protein, we aim to sample the most likely transition path between the two states. Sampling such a transition path is computationally expensive due to the existence of high free energy barriers between the two states. To circumvent this, previous work has focused on simplifying the trajectories to occur along specific molecular descriptors called Collective Variables (CVs). However, finding CVs is not trivial and requires chemical intuition. For larger molecules, where intuition is not sufficient, using these CVbased methods biases the transition along possibly irrelevant dimensions. Instead, this work proposes a method for sampling transition paths that consider the entire geometry of the molecules. To achieve this, we first relate the problem to recent work on the Schrodinger bridge problem and stochastic optimal control. Using this relation, we construct a method that takes into account important characteristics of molecular systems such as secondorder dynamics and invariance to rotations and translations. We demonstrate our method on the commonly studied Alanine Dipeptide, but also consider larger proteins such as Polyproline and Chignolin. 
Lars Holdijk · Yuanqi Du · Priyank Jaini · Ferry Hooft · Bernd Ensing · Max Welling 🔗 


Evaluating SelfSupervised Learned Molecular Graphs
(
Poster
)
link »
Because of data scarcity in realworld scenarios, obtaining pretrained representations via selfsupervised learning (SSL) has attracted increasing interest. Although various methods have been proposed, it is still underexplored what knowledge the networks learn from the pretraining tasks and how it relates to downstream properties. In this work, with an emphasis on chemical molecular graphs, we fill in this gap by devising a range of nodelevel, pairlevel, and graphlevel probe tasks to analyse the representations from pretrained graph neural networks (GNNs). We empirically show that: 1. Pretrained models have better downstream performance compared to randomlyinitialised models due to their improved the capability of capturing global topology and recognising substructures. 2. However, randomly initialised models outperform pretrained models in terms of retaining local topology. Such information gradually disappears from the early layers to the last layers for pretrained models. 
Hanchen Wang · Shengchao Liu · Jean Kaddour · Qi Liu · Jian Tang · Matt Kusner · Joan Lasenby 🔗 


Unifying physical systemsâ€™ inductive biases in neural ODE using dynamics constraints
(
Poster
)
link »
SlidesLive Video » Conservation of energy is at the core of many physical phenomena and dynamical systems. There have been a significant number of works in the past few years aimed at predicting the trajectory of motion of dynamical systems using neural networks while adhering to the law of conservation of energy. Most of these works are inspired by classical mechanics such as Hamiltonian and Lagrangian mechanics as well as Neural Ordinary Differential Equations. While these works have been shown to work well in specific domains respectively, there is a lack of a unifying method that is more generally applicable without requiring significant changes to the neural network architectures. In this work, we aim to address this issue by providing a simple method that could be applied to not just energyconserving systems, but also dissipative systems, by including a different inductive bias in different cases in the form of a regularisation term in the loss function. The proposed method does not require changing the neural network architecture and could form the basis to validate a novel idea, therefore showing promises to accelerate research in this direction. 
Yi Heng Lim · Muhammad Kasim 🔗 


PowerGraph: Using neural networks and principal components to determine multivariate statistical power tradeoffs
(
Poster
)
link »
SlidesLive Video » Statistical power estimation for studies with multiple model parameters is inherently a multivariate problem. Power for individual parameters of interest cannot be reliably estimated univariately since correlation and variance explained relative to one parameter will impact the power for another parameter, all usual univariate considerations being equal. Explicit solutions in such cases, especially for models with many parameters, are either impractical or impossible to solve, leaving researchers to the prevailing method of simulating power. However, the point estimates for a vector of model parameters are uncertain, and the impact of inaccuracy is unknown. In such cases, sensitivity analysis is recommended such that multiple combinations of possible observable parameter vectors are simulated to understand power tradeoffs. A limitation to this approach is that it is computationally expensive to generate sufficient sensitivity combinations to accurately map the power tradeoff function in increasingly highdimensional spaces for the models that social scientists estimate. This paper explores the efficient estimation and graphing of statistical power for a study over varying model parameter combinations. We propose a simple and generalizable machine learning inspired solution to cut the computational cost to less than 10% of the brute force method while providing F1 scores above 90%. We further motivate the impact of transfer learning in learning power manifolds across varying distributions. 
Ajinkya Mulay · Sean Lane · Erin Hennes 🔗 


From Kepler to Newton: Explainable AI for Science Discovery
(
Poster
)
link »
SlidesLive Video » The ObservationHypothesisPredictionExperimentation loop paradigm for scientific research has been practiced by researchers for years towards scientific discoveries. However, with data explosion in both megascale and milliscale scientific research, it has been sometimes very difficult to manually analyze the data and propose new hypothesis to drive the cycle for scientific discovery.In this paper, we discuss the role of Explainable AI in scientific discovery process by demonstrating an Explainable AIbased paradigm for science discovery. The key is to use Explainable AI to help derive data or model interpretations as well as scientific discoveries or insights. We show how computational and dataintensive methodology  together with experimental and theoretical methodology  can be seamlessly integrated for scientific research. To demonstrate the AIbased science discovery process, and to pay our respect to some of the greatest minds in human history, we show how Kepler's laws of planetary motion and the Newton's law of universal gravitation can be rediscovered by (Explainable) AI based on Tycho Brahe's astronomical observation data, whose works were leading the scientific revolution in the 1617th century. This work also highlights the important role of Explainable AI (as compared to Blackbox AI) in science discovery to help humans prevent or better prepare for the possible technological singularity that may happen in the future. 
Zelong Li · jianchao ji · Yongfeng Zhang 🔗 


LAST: Latent Space Assisted Adaptive Sampling for Protein Trajectories
(
Poster
)
link »
SlidesLive Video » Molecular dynamics (MD) simulation is widely used to study protein conformations and dynamics. However, conventional simulation suffers from being trapped in some local energy minima that are hard to escape. Thus, most computational time is spent sampling in the already visited regions. This leads to an inefficient sampling process and further hinders the exploration of protein movements in affordable simulation time. The advancement of deep learning provides new opportunities for protein sampling. Variational autoencoders are a class of deep learning models to learn a lowdimensional representation (referred to as the latent space) that can capture the key features of the input data. Based on this characteristic, we proposed a new adaptive sampling method, latent space assisted adaptive sampling for protein trajectories (LAST), to accelerate the exploration of protein conformational space. This method comprises cycles of (i) variational autoencoders training, (ii) seed structure selection on the latent space and (iii) conformational sampling through additional MD simulations. The proposed approach is validated through the sampling of four structures of two protein systems: two metastable states of E. Coli adenosine kinase (ADK) and two native states of Vivid (VVD). In all four conformations, seed structures were shown to lie on the boundary of conformation distributions. Moreover, large conformational changes were observed in a shorter simulation time when compared with conventional MD (cMD) simulations in both systems. In metastable ADK simulations, LAST explored two transition paths toward two stable states while cMD became trapped in an energy basin. In VVD light state simulations, LAST was three times faster than cMD simulation with a similar conformational space. 
Hao Tian · Xi Jiang · Sian Xiao · Hunter La Force · Eric Larson · Peng Tao 🔗 


OneShot Transfer Learning of PhysicsInformed Neural Networks
(
Poster
)
link »
SlidesLive Video » Solving differential equations efficiently and accurately sits at the heart of progress in many areas of scientific research, from classical dynamical systems to quantum mechanics. There is a surge of interest in using PhysicsInformed Neural Networks (PINNs) to tackle such problems as they provide numerous benefits over traditional numerical approaches. Despite their potential benefits for solving differential equations, transfer learning has been under explored. In this study, we present a general framework for transfer learning PINNs that results in oneshot inference for linear systems of both ordinary and partial differential equations. This means that highly accurate solutions to many unknown differential equations can be obtained instantaneously without retraining an entire network. We demonstrate the efficacy of the proposed deep learning approach by solving several realworld problems, such as first and secondorder linear ordinary equations, the Poisson equation, and the timedependent Schr\"{o}dinger complexvalue partial differential equation. 
Shaan Desai · Marios Mattheakis · Hayden Joy · Pavlos Protopapas · S Roberts 🔗 


Weakly Supervised Inversion of Multiphysics Data for Geophysical Properties
(
Poster
)
link »
SlidesLive Video »
Multiphysics inversion plays a critical role in geophysics. It has been widely used to simultaneously infer various geophysical properties~(such as velocity and conductivity). Among those inversion problems, some are explicitly governed by partial differential equations~(PDEs), while others are not. Without explicit governing equations, conventional physicalbased inversion techniques are not feasible and datadriven inversion requires expensive full labels. To overcome this issue, we proposed a new datadriven multiphysics inversion technique with extremely weak supervision. Our key finding is that the pseudo labels can be constructed by learning the local relationship among geophysical properties at very sparse locations. We explore the multiphysics inversion problem from two distinct measurements~(seismic and electromagnetic data) to three geophysical properties~(velocity, conductivity, and CO$_2$ saturation) with synthetic data based on the Kimberlina storage reservoir in California. Our results show that we are able to invert for properties without explicit governing equations. Moreover, the labeled data on three geophysical properties can be significantly reduced by 50 times~(from 100 down to only 2 locations).

Shihang Feng · Peng Jin · Yinpeng Chen · Xitong Zhang · Zicheng Liu · David Alumbaugh · Michael Commer · Youzuo Lin 🔗 


How Much of the Chemical Space Has Been Explored? Selecting the Right Exploration Measure for Drug Discovery
(
Poster
)
link »
SlidesLive Video » Forming a molecular candidate set that contains a wide range of potentially effective compounds is crucial to the success of drug discovery. While many aim to optimize particular chemical properties, there is limited literature on how to properly measure and encourage the exploration of the chemical space when generating drug candidates. This problem is challenging due to the lack of formal criteria to select good exploration measures. We propose a novel framework to systematically evaluate exploration measures for drug candidate generation. The procedure is built upon three formal analyses: an axiomatic analysis that validates the potential measures analytically, an empirical analysis that compares the correlations of the measures to a proxy gold standard, and a practical analysis that benchmarks the effectiveness of the measures in an optimization procedure of molecular generation. We are able to evaluate a wide range of potential exploration measures under this framework and make recommendations on existing and novel exploration measures that are suitable for the task of drug discovery. 
Yutong Xie · Ziqiao Xu · Jiaqi Ma · Qiaozhu Mei 🔗 


No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the EntorhinalHippocampal Circuit
(
Poster
)
link »
SlidesLive Video » Fundamental research in Neuroscience is currently undergoing a renaissance based on deep learning. The central promises of deep learningbased modeling of brain circuits are that the models shed light on evolutionary optimization problems, constraints and solutions, and generate novel predictions regarding neural phenomena. We show, through the casestudy of grid cells in the entorhinalhippocampal circuit, that one often gets neither. We begin by reviewing the principles of grid cell mechanism and function obtained from analytical and firstprinciples modeling efforts, then consider the claims of deep learning models of grid cells and rigorously examine their results under varied conditions. Using largescale hyperparameter sweeps and hypothesisdriven experimentation, we demonstrate that the results of such models may reveal more about particular and nonfundamental implementation choices than fundamental truths about neural circuits or the loss function(s) they might optimize. Finally, we discuss why it is that these models of the brain cannot be expected to work without the addition of substantial amounts of inductive bias, an informal No Free Lunch theorem for Neuroscience. In conclusion, caution and consideration, together with biological knowledge, are warranted in building and interpreting deep learning models in Neuroscience. 
Rylan Schaeffer · Mikail Khona · Ila R. Fiete 🔗 


Predicting generalization with degrees of freedom in neural networks
(
Poster
)
link »
Model complexity is fundamentally tied to predictive power in the sciences as well as in applications. However, there is a divergence between naive measures of complexity such as parameter count and the generalization performance of overparameterized machine learning models. Prior empirical approaches to capturing intrinsic complexity independent of parameter count are computationally intractable, do not capture the implicitly regularizing effects of the entire machinelearning pipeline, or do not provide a quantitative fit to the double descent behavior of overparameterized models. In this work, we introduce an empirical complexity measure inspired by the classical notion of degrees of freedom in statistics. This measure can be approximated efficiently and is a function of the entire model training pipeline. We demonstrate that this measure strongly correlates with generalization performance in the doubledescent regime. 
Erin Grant · Yan Wu 🔗 


Unsupervised Discovery of InertialFusionRelevant Plasma Physics using Differentiable Kinetic Simulations
(
Poster
)
link »
Plasma supports collective modes and particlewave interactions that leads to complex behavior in inertial fusion energy applications. While plasma can sometimes be modeled as a charged fluid, a kinetic description is useful towards the study of nonlinear effects in the higher dimensional momentumposition phasespace that describes the full complexity of plasma dynamics. We create a differentiable solver for the plasma kinetics 3D partialdifferentialequation and introduce a domainspecific objective function. Using this framework, we perform gradientbased optimization of neural networks that provide forcing function parameters to the differentiable solver given a set of initial conditions. We apply this to an inertialfusion relevant configuration and find that the optimization process exploits a novel physical effect that has previously remained undiscovered. 
Archis Joglekar · Alexander Thomas 🔗 


MultiScale MeshGraphNets
(
Poster
)
link »
SlidesLive Video » In recent years, there has been a growing interest in using machine learning to overcome the high cost of numerical simulation, with some learned models achieving impressive speedups over classical solvers whilst maintaining accuracy. However, these methods are usually tested at lowresolution settings, and it remains to be seen whether they can scale to the costly highresolution simulations that we ultimately want to tackle.In this work, we propose two complementary approaches to improve the framework from MeshGraphNets, which demonstrated accurate predictions in a broad range of physical systems. MeshGraphNets relies on a message passing graph neural network to propagate information, and this structure becomes a limiting factor for highresolution simulations, as equally distant points in space become further apart in graph space. First, we demonstrate that it is possible to learn accurate surrogate dynamics of a highresolution system on a much coarser mesh, both removing the message passing bottleneck and improving performance; and second, we introduce a hierarchical approach (MultiScale MeshGraphNets) which passes messages on two different resolutions (fine and coarse), significantly improving the accuracy of MeshGraphNets while requiring less computational resources. 
Meire Fortunato · Tobias Pfaff · Peter Wirnsberger · Alexander Pritzel · Peter Battaglia 🔗 


Learning to Solve PDEconstrained Inverse Problems with Graph Networks
(
Poster
)
link »
SlidesLive Video » Learned graph neural networks (GNNs) have recently been established as fast and accurate alternatives for principled solvers in simulating the dynamics of physical systems. In many application domains across science and engineering, however, we are not only interested in a forward simulation but also in solving inverse problems with constraints defined by a partial differential equation (PDE). Here we explore GNNs to solve such PDEconstrained inverse problems. Given a sparse set of measurements, we are interested in recovering the initial condition or parameters of the PDE. We demonstrate that GNNs combined with autodecoderstyle priors are wellsuited for these tasks, achieving more accurate estimates of initial conditions or physical parameters than other learned approaches when applied to the wave equation or Navier Stokes equations. We also demonstrate computational speedups of up to 90x using GNNs compared to principled solvers. 
QINGQING ZHAO · David B. Lindell · Gordon Wetzstein 🔗 


Learning to Solve PDEconstrained Inverse Problems with Graph Networks
(
Oral
)
link »
Learned graph neural networks (GNNs) have recently been established as fast and accurate alternatives for principled solvers in simulating the dynamics of physical systems. In many application domains across science and engineering, however, we are not only interested in a forward simulation but also in solving inverse problems with constraints defined by a partial differential equation (PDE). Here we explore GNNs to solve such PDEconstrained inverse problems. Given a sparse set of measurements, we are interested in recovering the initial condition or parameters of the PDE. We demonstrate that GNNs combined with autodecoderstyle priors are wellsuited for these tasks, achieving more accurate estimates of initial conditions or physical parameters than other learned approaches when applied to the wave equation or Navier Stokes equations. We also demonstrate computational speedups of up to 90x using GNNs compared to principled solvers. 
QINGQING ZHAO · David B. Lindell · Gordon Wetzstein 🔗 


Curvatureinformed multitask learning for graph networks
(
Poster
)
link »
SlidesLive Video » Properties of interest for crystals and molecules, such as band gap, elasticity, and solubility, are generally related to each other: they are governed by the same underlying laws of physics. However, when stateoftheart graph neural networks attempt to predict multiple properties simultaneously (the multitask learning (MTL) setting), they frequently underperform a suite of single property predictors. This suggests graph networks may not be fully leveraging these underlying similarities. Here we investigate a potential explanation for this phenomenon â€“ the curvature of each propertyâ€™s loss surface significantly varies, leading to inefficient learning. This difference incurvature can be assessed by looking at spectral properties of the Hessians of each propertyâ€™s loss function, which is done in a matrixfree manner via randomized numerical linear algebra. We evaluate our hypothesis on two benchmark datasets (Materials Project (MP) and QM8) and consider how these findings can inform the training of novel multitask learning models. 
Alexander New · Michael Pekala · Nam Q. Le · Janna Domenico · Christine Piatko · Christopher Stiles 🔗 


Neural Basis Functions for Accelerating Solutions to high Mach Euler Equations
(
Poster
)
link »
SlidesLive Video » We propose an approach to solving partial differential equations (PDEs) using a set of neural networks which we call Neural Basis Functions (NBF). This NBF framework is a novel variation of the POD DeepONet operator learning approach where we regress a set of neural networks onto a reduced order Proper Orthogonal Decomposition (POD) basis. These networks are then used in combination with a branch network that ingests the parameters of the prescribed PDE to compute a reduced order approximation to the PDE. This approach is applied to the steady state Euler equations for high speed flow conditions (mach 1030) where we consider the 2D flow around a cylinder which develops a shock condition. We then use the NBF predictions as initial conditions to a high fidelity Computational Fluid Dynamics (CFD) solver (CFD++) to show faster convergence. Lessons learned for training and implementing this algorithm will be presented as well. 
David Witman · Alexander New · Hicham Alkandry · Honest Mrema 🔗 


MAgNet: Mesh Agnostic Neural PDE Solver
(
Poster
)
link »
SlidesLive Video » The computational complexity of classical numerical methods for solving Partial Differential Equations (PDE) scales significantly as the resolution increases. When it comes to climate predictions, fine spatiotemporal resolutions are required to resolve all turbulent scales in the fluid simulations. This makes the task of accurately resolving these scales computationally out of reach even with modern supercomputers. As a result, climate modelers solve these PDEs on grids that are too coarse (3km to 200km on each side), which hinders the accuracy and usefulness of the predictions. In this paper, we leverage the recent advances in Implicit Neural Representations (INR) to design a novel architecture that predicts the spatially continuous solution of a PDE given a spatial position query. By augmenting coordinatebased architectures with Graph Neural Networks (GNN), we enable zeroshot generalization to new nonuniform meshes and longterm predictions up to 250 frames ahead that are physically consistent. Our Mesh Agnostic Neural PDE Solver (MAgNet) is able to make accurate predictions across a variety of PDE simulation datasets and compares favorably with existing baselines. Moreover, our model generalizes well to different meshes and resolutions up to four times those trained on. 
Oussama Boussif · Yoshua Bengio · Loubna Benabbou · Dan Assouline 🔗 


Multiscale Neural Operator: Learning Fast and Gridindependent PDE Solvers
(
Poster
)
link »
SlidesLive Video »
Numerical simulations in climate, chemistry, or astrophysics are computationally too expensive for uncertainty quantification or parameterexploration at highresolution. Reducedorder or surrogate models are multiple orders of magnitude faster, but traditional surrogates are inflexible or inaccurate and pure machine learning (ML)based surrogates too datahungry. We propose a hybrid, flexible surrogate model that exploits known physics for simulating largescale dynamics and limits learning to the hardtomodel term, which is called parametrization or closure and captures the effect of fine onto largescale dynamics. Leveraging neural operators, we are the first to learn gridindependent, nonlocal, and flexible parametrizations. Our $\textit{multiscale neural operator}$ is motivated by a rich literature in multiscale modeling, has quasilinear runtime complexity, is more accurate or flexible than stateoftheart parametrizations and demonstrated on the chaotic equation multiscale Lorenz96.

BjÃ¶rn LÃ¼tjens · Catherine Crawford · Campbell Watson · Christopher Hill · Dava Newman 🔗 


Differentiable Physics Simulations with Contacts: Do They Have Correct Gradients w.r.t. Position, Velocity and Control?
(
Poster
)
link »
SlidesLive Video » In recent years, an increasing amount of work has focused on differentiable physics simulation and has produced a set of open source projects such as Tiny Differentiable Simulator, Nimble Physics, diffTaichi, Brax, Warp and DiffCoSim. By making physics simulations endtoend differentiable, we can perform gradientbased optimization and learning tasks. A majority of differentiable simulators consider collisions and contacts between objects, but they use different contact models for differentiability. In this paper, we overview four kinds of differentiable contact formulations  linear complementarity problems (LCP), convex optimization models, compliant models and positionbased dynamics (PBD). We analyze and compare the gradients calculated by these models and show that the gradients are not always correct. We also demonstrate their ability to learn an optimal control strategy by comparing the learned strategies with the optimal strategy in an analytical form. 
Yaofeng Zhong · Jiequn Han · Georgia Olympia Brikis 🔗 


Transform Once: Efficient Operator Learning in Frequency Domain
(
Poster
)
link »
SlidesLive Video »
Spectrum analysis provides one of the most effective paradigms for informationpreserving dimensionality reduction in data: often, a simple description of naturally occurring signals can be obtained via few terms of periodic basis functions. Neural operators designed for frequency domain learning are based on complexvalued transforms i.e. Fourier Transforms (FT), and layers that perform computation on the spectrum and input data separately. This design introduces considerable computational overhead: for each layer, a forward and inverse FT. Instead, this work introduces a blueprint for frequency domain learning through a single transform: transform once (T1). To enable efficient, direct learning in the frequency domain we develop a variance preserving weight initialization scheme and address the open problem of choosing a transform. Our results significantly streamline the design process of neural operators, pruning redundant transforms, and leading to speedups of 3 x to 30 that increase with data resolution and model size. We perform extensive experiments on learning to solve partial differential equations, including incompressible NavierStokes, turbulent flows around airfoils, and highresolution video of smoke dynamics. T1 models improve on the test performance of SOTA neural operators while requiring significantly less computation, with over $30\%$ reduction in predictive error across tasks.

Michael Poli · Stefano Massaroli · Federico Berto · Jinkyoo Park · Tri Dao · Christopher Re · Stefano Ermon 🔗 


Provable Concept Learning for Interpretable Predictions Using Variational Autoencoders
(
Poster
)
link »
SlidesLive Video » In safetycritical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixellevel attributions or use previously known concepts. In this paper we aim to provide explanations by provably identifying \emph{highlevel, previously unknown concepts}. To this end, we propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP)  a VAEbased classifier that uses visually interpretable concepts as linear predictors. Assuming that the data generating mechanism involves interpretable concepts, we prove that our method is able to identify them while attaining optimal classification accuracy. We use synthetic experiments for validation, and also show that on the ChestXRay dataset, CLAP effectively discovers interpretable factors for classifying diseases. 
Armeen Taeb · NicolÃ² Ruggeri · Carina Schnuck · Fanny Yang 🔗 


Recovering Stochastic Dynamics via Gaussian SchrÃ¶dinger Bridges
(
Poster
)
link »
SlidesLive Video »
We propose a new framework to reconstruct a stochastic process $\left\{\mathbb{P}_{t}: t \in[0, T]\right\}$ using only samples from its marginal distributions, observed at start and end times 0 and $T$. This reconstruction is useful to infer population dynamics, a crucial challenge, e.g., when modeling the timeevolution of cell populations from singlecell sequencing data. Our general framework encompasses the more specific SchrÃ¶dinger bridge $(\mathrm{SB})$ problem, where $\mathbb{P}_{t}$ represents the evolution of a thermodynamic system at almost equilibrium. Estimating such bridges is notoriously difficult, motivating our proposal for a novel adaptive scheme called the GSBflow. Our goal is to rely on Gaussian approximations of the data to provide the reference stochastic process needed to estimate SB. To that end, we solve the SB problem with Gaussian marginals, for which we provide, as a central contribution, a closedform solution, and SDE representation. We use these formulas to define the reference process used to estimate more complex SBs, and obtain notable numerical improvements when reconstructing both synthetic processes and singlecell genomics.

Charlotte Bunne · YaPing Hsieh · Marco Cuturi · Andreas Krause 🔗 


Towards Learning SelfOrganized Criticality of Rydberg Atoms using Graph Neural Networks
(
Poster
)
link »
SlidesLive Video » SelfOrganized Criticality (SOC) is a ubiquitous dynamical phenomenon believed to be responsible for the emergence of universal scaleinvariant behavior in many, seemingly unrelated systems, such as forest fires, virus spreading or atomic excitation dynamics. SOC describes the buildup of largescale and longrange spatiotemporal correlations as a result of only local interactions and dissipation. The simulation of SOC dynamics is typically based on MonteCarlo (MC) methods, which are however numerically expensive and do not scale beyond certain system sizes. We investigate the use of Graph Neural Networks (GNNs) as an effective surrogate model to learn the dynamics operator for a paradigmatic SOC system, inspired by an experimentally accessible physics example: driven Rydberg atoms. To this end, we generalize existing GNN simulation approaches to predict dynamics for the internal state of the node. We show that we can accurately reproduce the MC dynamics as well as generalize along the two important axes of particle number and particle density. This paves the way to model much larger systems beyond the limits of traditional MC methods. While the exact system is inspired by the dynamics of Rydberg atoms, the approach is quite general and can readily be applied to other systems. 
Simon Ohler · Daniel Brady · Winfried LÃ¶tzsch · Michael Fleischhauer · Johannes Otterbach 🔗 


Learning the Solution Operator of Boundary Value Problems using Graph Neural Networks
(
Poster
)
link »
SlidesLive Video » As an alternative to classical numerical solvers for partial differential equations (PDEs) subject to boundary value constraints, there has been a surge of interest in investigating neural networks that can solve such problems efficiently. In this work, we design a general solution operator for two different timeindependent PDEs using graph neural networks (GNNs) and spectral graph convolutions. We train the networks on simulated data from a finite elements solver on a variety of shapes and inhomogeneities. In contrast to previous works, we focus on the ability of the trained operator to generalize to previously unseen scenarios. Specifically, we test generalization to meshes with different shapes and superposition of solutions for a different number of inhomogeneities. We find that training on a diverse dataset with lots of variation in the finite element meshes is a key ingredient for achieving good generalization results in all cases. With this, we believe that GNNs can be used to learn solution operators that generalize over a range of properties and produce solutions much faster than a generic solver. Our dataset, which we make publicly available, can be used and extended to verify the robustness of these models under varying conditions. 
Winfried LÃ¶tzsch · Simon Ohler · Johannes Otterbach 🔗 


Pretraining Transformers for Molecular Property Prediction Using Reaction Prediction
(
Poster
)
link »
SlidesLive Video » Molecular property prediction is essential in chemistry, especially for drug discovery applications. However, available molecular property data is often limited, encouraging the transfer of information from related data. Transfer learning has had a tremendous impact in fields like Computer Vision and Natural Language Processing signalling for its potential in molecular property prediction. We present a pretraining procedure for molecular representation learning using reaction data and use it to pretrain a SMILES Transformer. We finetune and evaluate the pretrained model on 12 molecular property prediction tasks from MoleculeNet within physical chemistry, biophysics, and physiology and show a statistically significant positive effect on 5 of the 12 tasks compared to a nonpretrained baseline model. 
Johan Broberg · Maria BÃ¥nkestad · Erik YlipÃ¤Ã¤ 🔗 


Featurizations Matter: A Multiview Contrastive Learning Approach to Molecular Pretraining
(
Poster
)
link »
Molecular representation learning, which aims to automate feature learning for molecules, is a vital task in computational chemistry and drug discovery. Despite rapid advances in molecular pretraining models with various types of featurizations, from SMILES strings, 2D graphs to 3D geometry, there is a paucity of research on how to utilize different molecular featurization techniques to obtain better representations. To bridge that gap, we present a novel multiview contrastive learning approach dubbed MEMO in this paper.Our pretraining framework, in particular, is capable of learning from four basic but nontrivial featurizations of molecules and adaptively learning to optimize the combinations of featurization techniques for different downstream tasks. Extensive experiments on a broad range of molecular property prediction benchmarks show that our MEMO outperforms stateoftheart baselines and also yields reasonable an interpretation of molecular featurizations weights in accordance with chemical knowledge. 
Yanqiao Zhu · Dingshuo Chen · Yuanqi Du · Yingze Wang · Qiang Liu · Shu Wu 🔗 


Sample Efficiency Matters: Benchmarking Molecular Optimization
(
Poster
)
link »
Efficient molecular design is one of the fundamental goals of computeraided drug or material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, with an emphasis on achieving high validity, diversity, novelty, and most recently synthesizability. However, a crucial aspect that is rarely discussed is the budget spent on the optimization. If candidates are evaluated by experiment or highfidelity simulation, as they are in realistic discovery settings, sample efficiency is paramount. In this paper, we thoroughly investigate 13 molecular design algorithms across 21 tasks within a limited oracle setting, allowing at most 10000 queries. We illustrate that most ``stateoftheart'' methods fail to outperform some classic algorithms. Our results also highlight the influence of the generative action space (e.g., tokenbytoken, atombyatom, fragmentbyfragment) on performance and the necessity of multiple independent runs and hyperparameter tuning. We suggest a standard experimental benchmark to minimize the wasted effort caused by nonreproducibility, artificially poor baselines, and easily misinterpreted results. 
Wenhao Gao · Tianfan Fu · Jimeng Sun · Connor Coley 🔗 


The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning
(
Poster
)
link »
SlidesLive Video » In general, large datasets enable deep learning models to perform with good accuracy and generalizability. However, massive highfidelity simulation datasets (from molecular chemistry, astrophysics, computational fluid dynamics (CFD), etc.) can be challenging to curate due to dimensionality and storage constraints. Lossy compression algorithms can help mitigate limitations from storage, as long as the overall data fidelity is preserved. To illustrate this point, we demonstrate that deep learning models, trained and tested on data from a petascale CFD simulation, are robust to errors introduced during lossy compression in a semantic segmentation problem. Our results demonstrate that lossy compression algorithms offer a realistic pathway for exposing highfidelity scientific data to opensource data repositories for building community datasets. In this paper, we outline, construct, and evaluate the requirements for establishing a big data framework for scientific machine learning. 
Wai Tong Chung · Ki Jung · Jacqueline Chen · Matthias Ihme 🔗 


DEQGAN: Learning the Loss Function for PINNs with Generative Adversarial Networks
(
Poster
)
link »
SlidesLive Video »
Solutions to differential equations are of significant scientific and engineering relevance. PhysicsInformed Neural Networks (PINNs) have emerged as a promising method for solving differential equations, but they lack a theoretical justification for the use of any particular loss function. This work presents Differential Equation GAN (DEQGAN), a novel method for solving differential equations using generative adversarial networks to "learn the loss function" for optimizing the neural network. Presenting results on a suite of twelve ordinary and partial differential equations, including the nonlinear Burgers', AllenCahn, Hamilton, and modified Einstein's gravity equations, we show that DEQGAN can obtain multiple orders of magnitude lower mean squared errors than PINNs that use $L_2$, $L_1$, and Huber loss functions. We also show that DEQGAN achieves solution accuracies that are competitive with popular numerical methods. Finally, we present two methods to improve the robustness of DEQGAN to different hyperparameter settings.

Blake Bullwinkel · Dylan Randle · Pavlos Protopapas · David Sondak 🔗 


Targetaware Molecular Graph Generation
(
Poster
)
link »
SlidesLive Video » Generating molecules with desired biological activities has attracted growing attention in drug discovery. Previous molecular generation models are designed as chemocentric methods that hardly consider the drugtarget interaction, limiting their practical applications. In this paper, we aim to generate molecular drugs in a targetaware manner that bridges biological activity and molecular design. To solve this problem, we compile a benchmark dataset from several publicly available datasets and build baselines in a unified framework. Building on the recent advantages of flowbased molecular generation models, we propose SiamFlow, which forces the flow to fit the distribution of target sequence embeddings in latent space. Specifically, we employ an alignment loss and a uniform loss to bring target sequence embeddings and drug graph embeddings into agreements while avoiding collapse. Furthermore, we formulate the alignment into a onetomany problem by learning spaces of target sequence embeddings. Experiments quantitatively show that our proposed method learns meaningful representations in the latent space toward the targetaware molecular graph generation and provides an alternative approach to bridge biology and chemistry in drug discovery. 
Cheng Tan · Zhangyang Gao · Stan Z. Li 🔗 


Pretraining Graph Neural Networks for Molecular Representations: Retrospect and Prospect
(
Poster
)
link »
SlidesLive Video » Recent years have witnessed remarkable advances in molecular representation learning using Graph Neural Networks (GNNs). To fully exploit the unlabeled molecular data, researchers first pretrain GNNs on largescale molecular databases and then finetune these pretrained Graph Models (GMs) in downstream tasks. The knowledge implicitly encoded in model parameters can benefit various downstream tasks and help to alleviate several fundamental challenges of molecular representation learning. In this paper, we provide a comprehensive survey of pretrained GMs for molecular representations. We first briefly present the limitations of molecular graph representation learning and thus introduce the motivation for molecular graph pretraining. Next, we systematically categorize existing pretrained GMs based on a taxonomy from four different perspectives including model architectures, pretraining strategies, tuning strategies, and applications. Finally, we outline several promising research directions that can serve as a guideline for future studies. 
Jun Xia · Yanqiao Zhu · Yuanqi Du · Stan Z. Li 🔗 


MeshIndependent Operator Learning for Partial Differential Equations
(
Poster
)
link »
SlidesLive Video » Operator learning, learning the mapping between function spaces, has been attracted as an alternativeapproach to traditional numerical methods to solve partial differential equations. In this paper, we propose to represent the discretized system as a setvalued data without a prior structure and construct the permutationsymmetric model, called meshindependent neural operator (MINO), to provide proper treatments of input functions and query coordinates of the solution function. Our models pretrained with a benchmark dataset of operator learning are evaluated by downstream tasks to demonstrate the generalization abilities to varying discretization formats of the system, which are natural characteristics of the continuous solution of the PDEs. 
Seungjun Lee 🔗 


Removing parasitic elements from Quantum Optical Coherence Tomography data with Convolutional Neural Networks
(
Poster
)
link »
SlidesLive Video » QOCT is a noncontact and noninvasive lightbased imaging method which is gaining attention due to its increased image resolution and quality. The biggest, yet unresolved, disadvantage of QOCT is artefacts, additional elements cluttering the images and leading to a loss of the structural information in the obtained images. In our work, Convolutional Neural Network (CNN) is applied to remove artefacts from Quantum Optical Coherence Tomography (QOCT) images. In our approach, we train our model with computergenerated data instead of experimental images. The preliminary results show that such an approach is successful in retrieving artefactfree structural information, even for multilayer objects, for which this information is lost due to the number of induced artefacts. The limitations and challenges associated with our approach are identified and discussed. 
Krzysztof Maliszewski · Sylwia Kolenderska · Varvara Vetrova 🔗 


Multiresolution Equivariant Graph Variational Autoencoder
(
Poster
)
link »
SlidesLive Video » In this paper, we propose Multiresolution Equivariant Graph Variational Autoencoders (MGVAE), the first hierarchical generative model to learn and generate graphs in a multiresolution and equivariant manner. At each resolution level, MGVAE employs higher order message passing to encode the graph while learning to partition it into mutually exclusive clusters and coarsening into a lower resolution that eventually creates a hierarchy of latent distributions. MGVAE then constructs a hierarchical generative model to variationally decode into a hierarchy of coarsened graphs. Importantly, our proposed framework is endtoend permutation equivariant with respect to node ordering. MGVAE achieves competitive results with several generative tasks including general graph generation, molecular generation, unsupervised molecular representation learning to predict molecular properties, link prediction on citation graphs, and graphbased image generation. 
Truong Son Hy · Risi Kondor 🔗 


Multiresolution Matrix Factorization and Wavelet Networks on Graphs
(
Poster
)
link »
SlidesLive Video » Multiresolution Matrix Factorization (MMF) is unusual amongst fast matrix factorization algorithms in that it does not make a low rank assumption. This makes MMF especially well suited to modeling certain types of graphs with complex multiscale or hierarchical strucutre. While MMF promises to yields a useful wavelet basis, finding the factorization itself is hard, and existing greedy methods tend to be brittle. In this paper, we propose a ``learnable'' version of MMF that carfully optimizes the factorization with a combination of reinforcement learning and Stiefel manifold optimization through backpropagating errors. We show that the resulting wavelet basis far outperforms prior MMF algorithms and provides the first version of this type of factorization that can be robustly deployed on standard learning tasks. Furthermore, we construct the wavelet neural networks (WNNs) learning graphs on the spectral domain with the wavelet basis produced by our MMF learning algorithm. Our wavelet networks are competitive against other stateoftheart methods in molecular graphs classification and node classification on citation graphs. 
Truong Son Hy · Risi Kondor 🔗 


Quantum Neural Architecture Search with Quantum Circuits Metric and Bayesian Optimization
(
Poster
)
link »
SlidesLive Video » Quantum neural networks are promising for a wide range of applications in the Noisy IntermediateScale Quantum era. As such, there is an increasing demand for automatic quantum neural architecture search. We tackle this challenge by designing a quantum circuits metric for Bayesian optimization with Gaussian process. To this goal, we develop quantum gates distance that characterizes the gates' action over every quantum state and provide a theoretical perspective on its geometric properties. Our approach significantly outperforms the benchmark on three empirical quantum machine learning problems including training a quantum generative adversarial network, solving combinatorial optimization in the MaxCut problem, and simulating quantum Fourier transform. Our method can be extended to characterize behaviors of various quantum machine learning models. 
Trong Duong · Sang Truong · Minh Pham · Bao Bach · JuneKoo Rhee 🔗 


Variational Inference for Soil Biogeochemical Models
(
Poster
)
link »
SlidesLive Video »
Soil biogeochemical models (SBMs) are an important tool used by Earth scientists to quantify the impact of rising global surface temperatures. SBMs represent the soil carbon and microbial dynamics across time as differential equations, and inference on model parameters is conducted to project changes in parameter values under warming climate conditions. Traditionally, the field has relied on MCMC algorithms for posterior inference, often implemented via probabilistic programming languages like Stan. However, computational cost makes it difficult to scale MCMC methods to more complex SBM models and largescale datasets. In this paper, we develop variational inference methods for timediscretized SBMs as an alternative to MCMC. We propose an efficient family of variational approximations based on GaussMarkov distributions that leverages the temporal structure of sequential models, scaling linearly in both time and space with respect to the sequence length. We show in experiments with simulated data and real CO$_2$ response ratios that our approach converges faster, and recovers posterior that more accurately captures uncertainty than previous variational methods. Our blackbox inference approach is designed to integrate with probabilistic programming languages to enable future scientific applications.

Debora Sujono · Hua Xie · Steven Allison · Erik Sudderth 🔗 


Centralized vs Individual Models for Decision Making in Interconnected Infrastructure
(
Poster
)
link »
SlidesLive Video » The 2013 National Infrastructure Protection Plan outlines the need for interconnected infrastructure systems to coordinate more and recognize their interdependencies. We model the two extremes of this coordination spectrum using two different multiagent models: (a) a model called the centralized model in which the agents are fully centralized and act as one unit in making decisions and (b) a model called the individual model in which the agents act completely separately and have either a pessimistic or optimistic assumption regarding the damages of the other infrastructure systems controlled by the other agents. We then use the individual model to establish a point along the coordination spectrum by providing the individual agents with delayed information regarding the other player(s). To test this framework, we use a small but illustrative model from a 2020 paper in which there is a power and a water network, and we assume that there are operators for both networks that would like to maximize flow according to a specific metric. Our results comparing partially repaired networks using the two models find that (i) the centralized model acts as an upper bound upon the individual model in terms of our flow metric and (ii) the delayed information individual model leads to less variability in results compared to the other individual model assumptions which points to the value of at least delayed coordination in decision making. 
Stephanie Allen · John P Dickerson · Steven Gabriel 🔗 


An Optical Pulse Stacking Environment and Reinforcement Learning Benchmarks
(
Poster
)
link »
SlidesLive Video » Deep reinforcement learning has the potential to address various scientific problems. In this paper, we implement an optics simulation environment for reinforcement learning based controllers. The environment incorporates nonconvex and nonlinear optical phenomena as well as more realistic timedependent noise. Then we provide the benchmark results of several stateoftheart reinforcement learning algorithms in the proposed simulation environment. In the end, we discuss the difficulty of controlling the realworld optical environment with reinforcement learning algorithms. We will make the simulation environment and code publicly available. 
Abulikemu Abuduweili · Changliu Liu 🔗 


$O(N^2)$ Universal Antisymmetry in Fermionic Neural Networks
(
Poster
)
link »
SlidesLive Video »
Fermionic neural network (FermiNet) is a recently proposed wavefunction Ansatz, which is used in variational Monte Carlo (VMC) methods to solve the manyelectron Schr\"{o}dinger equation. FermiNet proposes permutationequivariant architectures, on which a Slater determinant is applied to induce antisymmetry. FermiNet is proved to have universal approximation capability with a single determinant, namely, it suffices to represent any antisymmetric function given sufficient parameters. However, the asymptotic computational bottleneck comes from the Slater determinant, which scales with $O(N^3)$ for $N$ electrons. In this paper, we substitute the Slater determinant with a pairwise antisymmetry construction, which is easy to implement and can reduce the computational cost to $O(N^2)$. Furthermore, we formally prove that the pairwise construction built upon permutationequivariant architectures can universally represent any antisymmetric function.

Tianyu Pang · Shuicheng Yan · Min Lin 🔗 


GAUCHE: A Library for Gaussian Processes in Chemistry
(
Poster
)
link »
SlidesLive Video » We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations however is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecule discovery, chemical reaction optimisation and protein engineering. 
RyanRhys Griffiths · Leo Klarner · Henry Moss · Aditya Ravuri · Sang Truong · Yuanqi Du · Arian Jamasb · Julius Schwartz · Austin Tripp · Bojana RankoviÄ‡ · Philippe Schwaller · Gregory Kell · Anthony Bourached · Alexander Chan · Jacob Moss · Chengzhi Guo · Alpha Lee · Jian Tang



Bias in the Benchmark: Systematic experimental errors in bioactivity databases confound multitask and metalearning algorithms
(
Poster
)
link »
SlidesLive Video » There is considerable interest in employing deep learning algorithms to predict pharmaceutically relevant properties of small molecules. To overcome the issues inherent in this lowdata regime, researchers are increasingly exploring multitask and metalearning algorithms that leverage sets of related biochemical and toxicological assays to learn robust and generalisable representations.However, we show that the data from which commonly used multitask benchmarks are derived often exhibits systematic experimental errors that lead to confounding statistical dependencies across tasks. Representation learning models that aim to acquire an inductive bias in this domain risk compounding these biases and overfitting to patterns that are counterproductive to many downstream applications of interest. We investigate to what extent these issues are reflected in the molecular embeddings learned by multitask graph neural networks and discuss methods to address this pathology. 
Leo Klarner · Michael Reutlinger · Torsten Schindler · Charlotte Deane · Garrett Morris 🔗 


Deep Learning and Symbolic Regression for Discovering Parametric Equations
(
Poster
)
link »
SlidesLive Video » Symbolic regression is a machine learning technique that can learn the governing formulas from data and thus has the potential to transform scientific discovery. However, symbolic regression is still limited in the complexity of the systems that it can analyze. Deep learning on the other hand has transformed machine learning in its ability to analyze extremely complex and highdimensional datasets. Here we develop a method that uses neural networks to extend symbolic regression to parametric systems where some coefficient may vary as a function of time but the underlying governing equation remains constant. We demonstrate our method on various analytic expressions and PDEs with varying coefficients and show that it extrapolate well outside of the training domain. The neural networkbased architecture can also integrate with other deep learning architectures so that it can analyze highdimensional data while being trained endtoend in a single step. To this end we integrate our architecture with convolutional neural networks and train the system endtoend to discover various physical quantities from 1D images of spring systems where the spring constant may vary. 
Samuel Kim · Michael Zhang · Peter Y. Lu · Marin Solja\v{c}i\'{c} 🔗 


A Density Functional Recommendation Approach for Accurate Predictions of Vertical Spin Splitting of Transition Metal Complexes
(
Poster
)
link »
Both conventional and machine learningbased density functional approximations (DFAs) have emerged as versatile approaches for virtual highthroughput screening and chemical discovery. To date, however, no single DFA is universally accurate for different chemical spaces. This DFA sensitivity is particularly high for openshell transitionmetalcontaining systems, where strong static correlation may dominate. With electron density fitting and transfer learning, we build a DFA recommender that selects the DFA with the lowest expected error in a systemdependent manner. We demonstrate this recommender approach on the prediction of vertical spinsplitting energies (i.e., the electronic energy difference between the highspin and lowspin state) of challenging transition metal complexes. This recommender yields relatively small errors (i.e., 2.1 kcal/mol) for transition metal chemistry and captures the distributions of the DFAs that are most likely to be accurate. 
Chenru Duan · Aditya Nandy · Heather Kulik 🔗 


A Density Functional Recommendation Approach for Accurate Predictions of Vertical Spin Splitting of Transition Metal Complexes
(
Oral
)
link »
Both conventional and machine learningbased density functional approximations (DFAs) have emerged as versatile approaches for virtual highthroughput screening and chemical discovery. To date, however, no single DFA is universally accurate for different chemical spaces. This DFA sensitivity is particularly high for openshell transitionmetalcontaining systems, where strong static correlation may dominate. With electron density fitting and transfer learning, we build a DFA recommender that selects the DFA with the lowest expected error in a systemdependent manner. We demonstrate this recommender approach on the prediction of vertical spinsplitting energies (i.e., the electronic energy difference between the highspin and lowspin state) of challenging transition metal complexes. This recommender yields relatively small errors (i.e., 2.1 kcal/mol) for transition metal chemistry and captures the distributions of the DFAs that are most likely to be accurate. 
Chenru Duan · Aditya Nandy · Heather Kulik 🔗 


Graph SelfSupervised Learning for Optoelectronic Properties of Organic Semiconductors
(
Poster
)
link »
SlidesLive Video » The search for new highperformance organic semiconducting molecules is challenging due to the vastness of the chemical space, machine learning methods, particularly deep learning models like graph neural networks (GNNs), have shown promising potential to address such challenges. However, practical applications of GNNs for chemistry are often limited by the availability of labelled data. Meanwhile, unlabelled molecular data is abundant and could potentially be utilized to alleviate the scarcity issue of labelled data. Here, we advocate the use of selfsupervised learning to improve the performance of GNNs by pretraining them with unlabeled molecular data. We investigate regression problems involving ground and excited state properties, both relevant for optoelectronic properties of organic semiconductors. Additionally, we extend the selfsupervised learning strategy to molecules in nonequilibrium configurations which are important for studyingthe effects of disorder. In all cases, we obtain considerable performance improvement over results without pretraining, in particular when labelled training data is limited, and such improvement is attributed to the capability of selfsupervised learning in identifying structural similarity among unlabeled molecules. 
Zaixi Zhang · Qi Liu · Shengyu Zhang · ChangYu (Kim) Hsieh · Liang Shi · CheeKong Lee 🔗 


Carla P. Gomes
(
Talk
)

🔗 