Timezone: »

Protein Design with Guided Discrete Diffusion
Nate Gruver · Samuel Stanton · Nathan Frey · Tim G. J. Rudner · Isidro Hotzel · Julien Lafrance-Vanasse · Arvind Rajpal · Kyunghyun Cho · Andrew Wilson
Event URL: https://openreview.net/forum?id=YaFTu8nDbV »

A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to a therapeutic target under locality and liability constraints, with 97% expression rate and 25% binding rate in exploratory in vitro experiments.

Author Information

Nate Gruver (New York University)
Samuel Stanton (Genentech)
Nathan Frey (MIT)
Tim G. J. Rudner (New York University)

I am a PhD Candidate in the Department of Computer Science at the University of Oxford, where I conduct research on probabilistic machine learning with Yarin Gal and Yee Whye Teh. My research interests span **Bayesian deep learning**, **variational inference**, and **reinforcement learning**. I am particularly interested in uncertainty quantification in deep learning, reinforcement learning as probabilistic inference, and probabilistic transfer learning. I am also a **Rhodes Scholar** and an **AI Fellow** at Georgetown University's Center for Security and Emerging Technology.

Isidro Hotzel (Genentech)
Julien Lafrance-Vanasse (Genentech)
Arvind Rajpal (University of California, Berkeley)
Kyunghyun Cho (New York University, Genentech)
Kyunghyun Cho

Kyunghyun Cho is an associate professor of computer science and data science at New York University and CIFAR Fellow of Learning in Machines & Brains. He is also a senior director of frontier research at the Prescient Design team within Genentech Research & Early Development (gRED). He was a research scientist at Facebook AI Research from June 2017 to May 2020 and a postdoctoral fellow at University of Montreal until Summer 2015 under the supervision of Prof. Yoshua Bengio, after receiving MSc and PhD degrees from Aalto University April 2011 and April 2014, respectively, under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He received the Samsung Ho-Am Prize in Engineering in 2021. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.

Andrew Wilson (New York University)

More from the Same Authors