Timezone: »

Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
Leo Klarner · Tim G. J. Rudner · Michael Reutlinger · Torsten Schindler · Garrett Morris · Charlotte Deane · Yee-Whye Teh

Wed Jul 26 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #124
Event URL: https://github.com/leojklarner/Q-SAVI »

Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift---a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.

Author Information

Leo Klarner (University of Oxford)
Tim G. J. Rudner (New York University)

I am a PhD Candidate in the Department of Computer Science at the University of Oxford, where I conduct research on probabilistic machine learning with Yarin Gal and Yee Whye Teh. My research interests span **Bayesian deep learning**, **variational inference**, and **reinforcement learning**. I am particularly interested in uncertainty quantification in deep learning, reinforcement learning as probabilistic inference, and probabilistic transfer learning. I am also a **Rhodes Scholar** and an **AI Fellow** at Georgetown University's Center for Security and Emerging Technology.

Michael Reutlinger
Torsten Schindler (Google)
Garrett Morris (University of Oxford)

# Biography Professor Morris began as an Oxford undergraduate Chemist, completing his Part II and DPhil with W. Graham Richards in molecular modelling and graphical protein sequence analysis. In 1991 he moved to The Scripps Research Institute, California, to work on developing the protein-ligand dockingsoftware, AutoDock. In 2000 he helped to launch the first biomedical volunteer computing project, FightAIDS@Home, which spawned other biomedical projects on IBM’s World Community Grid. He moved back to the UK in 2008 to work in the Oxford spinout, InhibOx (now Oxford Drug Design), doing ‘real-world’ drug discovery, where he co-supervised SABS CDT students, developed novel virtual screening methods, and spearheaded the use of cloud computing. He later worked at another Oxford spinout, Crysalin, developing novel protein engineering techniques for reliable protein crystallization. He is now an Associate Professor and works closely with Prof Deane in the Department of Statistics in the Oxford Protein Informatics Group (OPIG). Like Charlotte, he is also Programme Co-Director for the EPSRC & MRC SABS CDT (Systems Approaches to Biomedical Science Centre for Doctoral Training), which as renewed in 2019 as the ESPRC & MRC Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research Centre for Doctoral Training, or EPSRC & MRC SABS R3 CDT. In September 2019, he was made Deputy Director of Graduate Studies here in the Department of Statistics. # Research Interests I am interested in the application of statistical methods and software development in computer-aided drug discovery, chiefly in high throughput docking, ligand-based virtual screening, network pharmacology and systems chemical biology, cheminformatics, bioinformatics. I am particularly interested in the applications of machine learning, including active learning, deep learning, and generative AI.

Charlotte Deane (University of Oxford)
Charlotte Deane

Charlotte is Professor of Structural Bioinformatics in the Department of Statistics at the University of Oxford and Chief Scientist of Biologics AI at Exscientia. She is also a co-director of the Systems Approaches to Biomedical Research Centre for Doctoral Training which she founded in 2009. She served on SAGE, the UK Government’s Scientific Advisory Group for Emergencies, during the COVID-19 pandemic, and acted as UK Research and Innovation’s COVID-19 Response Director. She has held numerous senior roles at the University of Oxford and until recently was the Deputy Executive Chair of the UK’s Engineering and Physical Sciences Research Council. At Oxford, Charlotte leads the Oxford Protein Informatics Group (OPIG), who work on diverse problems across immunoinformatics, protein structure and small molecule drug discovery; using statistics, AI and computation to generate biological and medical insight. Her work focuses on the development of novel algorithms, tools and databases that are openly available to the community. These tools are widely used web resources and are also part of several Pharma drug discovery pipelines. Charlotte is on several advisory boards and has consulted extensively with industry. She has set up a consulting arm within her own research group as a way of promoting industrial interaction and use of the group’s software tools.

Yee-Whye Teh (Oxford and DeepMind)

More from the Same Authors