Timezone: »

 
Workshop
Theoretical Physics for Deep Learning
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna

Fri Jun 14 08:30 AM -- 06:00 PM (PDT) @ 104 C

Though the purview of physics is broad and includes many loosely connected subdisciplines, a unifying theme is the endeavor to provide concise, quantitative, and predictive descriptions of the often large and complex systems governing phenomena that occur in the natural world. While one could debate how closely deep learning is connected to the natural world, it is undeniably the case that deep learning systems are large and complex; as such, it is reasonable to consider whether the rich body of ideas and powerful tools from theoretical physicists could be harnessed to improve our understanding of deep learning. The goal of this workshop is to investigate this question by bringing together experts in theoretical physics and deep learning in order to stimulate interaction and to begin exploring how theoretical physics can shed light on the theory of deep learning.

We believe ICML is an appropriate venue for this gathering as members from both communities are frequently in attendance and because deep learning theory has emerged as a focus at the conference, both as an independent track in the main conference and in numerous workshops over the last few years. Moreover, the conference has enjoyed an increasing number of papers using physics tools and ideas to draw insights into deep learning.

Fri 8:30 a.m. - 8:40 a.m.
Opening Remarks [ Video
Jaehoon Lee, Jeffrey Pennington, Yasaman Bahri, Max Welling, Surya Ganguli, Joan Bruna
Fri 8:40 a.m. - 9:10 a.m.
[ Video

Speaker: Andrea Montanari (Stanford)

Abstract: Abstract: We consider the problem of learning an unknown function f on the d-dimensional sphere with respect to the square loss, given i.i.d. samples (yi,xi) where xi is a feature vector uniformly distributed on the sphere and yi = f(x_i). We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: (RF) The random feature model of Rahimi-Recht; (NT) The neural tangent kernel model of Jacot-Gabriel-Hongler. Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and hence enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d.

We prove that, if both d and N are large, the behavior of these models is instead remarkably simpler. If N is of smaller order than d^2, then RF performs no better than linear regression with respect to the raw features xi, and NT performs no better than linear regression with respect to degree-one and two monomials in the xi's. More generally, if N is of smaller order than d^{k+1} then RF fits at most a degree-k polynomial in the raw features, and NT fits at most a degree-(k+ 1) polynomial. We then focus on the case of quadratic functions, and N= O(d). We show that the gap in generalization error between fully trained neural networks and the linearized models is potentially unbounded. [based on joint work with Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz]

Andrea Montanari
Fri 9:10 a.m. - 9:40 a.m.
[ Video

Speaker: Lenka Zdeborova (CEA/SACLAY)

Abstract: A key question of current interest is: How are properties of optimization and sampling algorithms influenced by the properties of the loss function in noisy high-dimensional non-convex settings? Answering this question for deep neural networks is a landmark goal of many ongoing works. In this talk I will answer this question in unprecedented detail for the spiked matrix-tensor model. Information theoretic limits, and Kac-Rice analysis of the loss landscapes, will be compared to the analytically studied performance of message passing algorithms, of the Langevin dynamics and of the gradient flow. Several rather non-intuitive results will be unveiled and explained.

Lenka Zdeborova
Fri 9:40 a.m. - 10:20 a.m.

A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Irivine)

Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam)

Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore)*; Anupam Gupta (National University of Singapore)

Towards a Definition of Disentangled Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (); Loic Matthey (DeepMind); Danilo Jimenez Rezende (Google DeepMind)

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha Sohl-Dickstein (Google Brain)

Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)*

Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)*; Shotaro Akaho (AIST); Shun-ichi Amari (RIKEN)

Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)*; Duccio Pappadopulo (Bloomberg)

Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford)*; Stefano Carrazza (University of Milan)

Roman Novak, Frederic Dreyer, Siavash Golkar, Irina Higgins, Joe Antognini, Rio Karakida, Rohan Ghosh
Fri 10:20 a.m. - 11:00 a.m.

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain); Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha Sohl-Dickstein (Google Brain)
Topology of Learning in Artificial Neural Networks Maxime Gabella (Magma Learning)

Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford); Stefano Carrazza (University of Milan)
Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)
; Duccio Pappadopulo (Bloomberg) Backdrop: Stochastic Backpropagation Siavash Golkar (NYU); Kyle Cranmer (New York University) Explain pathology in Deep Gaussian Process using Chaos Theory Anh Tong (UNIST); Jaesik Choi (Ulsan National Institute of Science and Technology)
Towards a Definition of Disentangled Representations Irina Higgins (DeepMind); David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (DeepMind); Loic Matthey (DeepMind); Danilo Jimenez Rezende (DeepMind)
Towards Understanding Regularization in Batch Normalization Ping Luo (The Chinese University of Hong Kong); Xinjiang Wang (); Wenqi Shao (The Chinese University of HongKong)
; Zhanglin Peng (SenseTime)
Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam); Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam)
Meanfield theory of activation functions in Deep Neural Networks Mirco Milletari (Microsoft)
; Thiparat Chotibut (SUTD) ; Paolo E. Trevisanutto (National University of Singapore) Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)*
SWANN: Small-World Neural Networks and Rapid Convergence Mojan Javaheripi (UC San Diego); Bita Darvish Rouhani (UC San Diego); Farinaz Koushanfar (UC San Diego)
Analysing the dynamics of online learning in over-parameterised two-layer neural networks Sebastian Goldt (Institut de Physique théorique, Paris)
; Madhu Advani (Harvard University); Andrew Saxe (University of Oxford); Florent Krzakala (École Normale Supérieure); Lenka Zdeborova (CEA Saclay)
A Halo Merger Tree Generation and Evaluation Framework Sandra Robles (Universidad Autónoma de Madrid); Jonathan Gómez (Pontificia Universidad Católica de Chile); Adín Ramírez Rivera (University of Campinas); Jenny Gonzáles (Pontificia Universidad Católica de Chile); Nelson Padilla (Pontificia Universidad Católica de Chile); Diego Dujovne (Universidad Diego Portales)
Learning Symmetries of Classical Integrable Systems Roberto Bondesan (Qualcomm AI Research)
, Austen Lamacraft (Cavendish Laboratory, University of Cambridge, UK) Cosmology inspired generative models Uros Seljak (UC Berkeley); Francois Lanusse (UC Berkeley)
Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)
; Shotaro Akaho (AIST); Shun-ichi Amari (RIKEN)
How Noise during Training Affects the Hessian Spectrum Mingwei Wei (Northwestern University); David Schwab (Facebook AI Research)*
A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine); Stephan Mandt (University of California, Irivine) Convergence Properties of Neural Networks on Separable Data Remi Tachet des Combes (Microsoft Research Montreal); Mohammad Pezeshki (Mila & University of Montreal); Samira Shabanian (Microsoft, Canada); Aaron Courville (MILA, Université de Montréal); Yoshua Bengio (Mila)
Universality and Capacity Metrics in Deep Neural Networks Michael Mahoney (University of California, Berkeley); Charles Martin (Calculation Consulting) Feynman Diagrams for Large Width Networks Guy Gur-Ari (Google); Ethan Dyer (Google)
Deep Learning on the 2-Dimensional Ising Model to Extract the Crossover Region Nicholas Walker (Louisiana State Univ - Baton Rouge)*
Large Scale Structure of the Loss Landscape of Neural Networks Stanislav Fort (Stanford University); Stanislaw Jastrzebski (New York University)
Momentum Enables Large Batch Training Samuel L Smith (DeepMind)
; Erich Elsen (Google); Soham De (DeepMind) Learning the Arrow of Time Nasim Rahaman (University of Heidelberg); Steffen Wolf (Heidelberg University); Anirudh Goyal (University of Montreal); Roman Remme (Heidelberg University); Yoshua Bengio (Mila) Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore); Anupam Gupta (National University of Singapore)
A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off Yaniv Blumenfeld (Technion); Dar Gilboa (Columbia University); Daniel Soudry (Technion)
Rethinking Complexity in Deep Learning: A View from Function Space Aristide Baratin (Mila, Université de Montréal)
; Thomas George (MILA, Université de Montréal); César Laurent (Mila, Université de Montréal); Valentin Thomas (MILA); Guillaume Lajoie (Université de Montréal, Mila); Simon Lacoste-Julien (Mila, Université de Montréal)
The Deep Learning Limit: Negative Neural Network eigenvalues just noise? Diego Granziol (Oxford); Stefan Zohren (University of Oxford); Stephen Roberts (Oxford); Dmitry P Vetrov (Higher School of Economics); Andrew Gordon Wilson (Cornell University); Timur Garipov (Samsung AI Center in Moscow) Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation Mariano Chouza (Tower Research Capital); Stephen Roberts (Oxford); Stefan Zohren (University of Oxford)
Deep Learning for Inverse Problems Abhejit Rajagopal (University of California, Santa Barbara)*; Vincent R Radzicki (University of California, Santa Barbara)

Fri 11:00 a.m. - 11:30 a.m.
[ Video

Speaker: Kyle Cranmer (NYU)

Abstract: The interplay between physics and deep learning is typically divided into two themes. The first is “physics for deep learning”, where techniques from physics are brought to bear on understanding dynamics of learning. The second is “deep learning for physics,” which focuses on application of deep learning techniques to physics problems. I will present a more nuanced view of this interplay with examples of how the structure of physics problems have inspired advances in deep learning and how it yields insights on topics such as inductive bias, interpretability, and causality.

Kyle Cranmer
Fri 11:30 a.m. - 12:00 p.m.
[ Video

Speaker: Michael Mahoney (ICSI and Department of Statistics, University of California at Berkeley)

Abstract:

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit self-regularization. For smaller and/or older DNNs, this implicit self-regularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of heavy-tailed self-regularization, similar to the self-organization seen in the statistical physics of disordered systems. This implicit self-regularization can depend strongly on the many knobs of the training process. In particular, by exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that---all else being equal---DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena. Coupled with work on energy landscapes and heavy-tailed spin glasses, it also suggests an explanation of why deep learning works. Joint work with Charles Martin of Calculation Consulting, Inc.

Michael Mahoney
Fri 12:00 p.m. - 12:15 p.m.
Analyzing the dynamics of online learning in over-parameterized two-layer neural networks (Oral) [ Video
Sebastian Goldt
Fri 12:15 p.m. - 12:30 p.m.
Convergence Properties of Neural Networks on Separable Data (Oral) [ Video
Remi Tachet des Combes
Fri 12:30 p.m. - 2:00 p.m.
Lunch (Break)
Fri 2:00 p.m. - 2:30 p.m.
[ Video

Speaker: Sanjeev Arora (Princeton/IAS)

Abstract: There is an old debate in neuroscience about whether or not learning has to boil down to optimizing a single cost function. This talk will suggest that even to understand mathematical properties of deep learning, we have to go beyond the conventional view of "optimizing a single cost function". The reason is that phenomena occur along the gradient descent trajectory that are not fully captured in the value of the cost function. I will illustrate briefly with three new results that involve such phenomena:

(i) (joint work with Cohen, Hu, and Luo) How deep matrix factorization solves matrix completion better than classical algorithms https://arxiv.org/abs/1905.13655

(ii) (joint with Du, Hu, Li, Salakhutdinov, and Wang) How to compute (exactly) with an infinitely wide net ("mean field limit", in physics terms) https://arxiv.org/abs/1904.11955

(iii) (joint with Kuditipudi, Wang, Hu, Lee, Zhang, Li, Ge) Explaining mode-connectivity for real-life deep nets (the phenomenon that low-cost solutions found by gradient descent are interconnected in the parameter space via low-cost paths; see Garipov et al'18 and Draxler et al'18)

Sanjeev Arora
Fri 2:30 p.m. - 2:45 p.m.
Towards Understanding Regularization in Batch Normalization (Oral) [ Video
Fri 2:45 p.m. - 3:00 p.m.
How Noise during Training Affects the Hessian Spectrum (Oral) [ Video
Fri 3:00 p.m. - 3:30 p.m.
Break and poster discussion (Break and Poster)
Fri 3:30 p.m. - 4:00 p.m.
[ Video

Speaker: Jascha Sohl-Dickstein (Google Brain)

Abstract: As neural networks become highly overparameterized, their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a rapidly growing body of work which examines the learning dynamics and prior over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel, both before and after training; that the predictions of wide neural networks are linear in their parameters throughout training; and that this perspective enables analytic predictions for how trainability depends on hyperparameters and architecture. These results provide for surprising capabilities -- for instance, the evaluation of test set predictions which would come from an infinitely wide trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for future theoretical and practical understanding of deep learning.

Jascha Sohl-Dickstein
Fri 4:00 p.m. - 4:15 p.m.
Asymptotics of Wide Networks from Feynman Diagrams (Oral) [ Video
Guy Gur-Ari
Fri 4:15 p.m. - 4:30 p.m.
A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off (Oral) [ Video
Dar Gilboa
Fri 4:30 p.m. - 4:45 p.m.
Deep Learning on the 2-Dimensional Ising Model to Extract the Crossover Region (Oral) [ Video
Nick Walker
Fri 4:45 p.m. - 5:00 p.m.
Learning the Arrow of Time (Oral) [ Video
Nasim Rahaman
Fri 5:00 p.m. - 6:00 p.m.

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha Sohl-Dickstein (Google Brain)

Topology of Learning in Artificial Neural Networks Maxime Gabella (Magma Learning)*

Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford)*; Stefano Carrazza (University of Milan)

Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)*; Duccio Pappadopulo (Bloomberg)

Backdrop: Stochastic Backpropagation Siavash Golkar (NYU)*; Kyle Cranmer (New York University)

Explain pathology in Deep Gaussian Process using Chaos Theory Anh Tong (UNIST)*; Jaesik Choi (Ulsan National Institute of Science and Technology)

Towards a Definition of Disentangled Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (DeepMind); Loic Matthey (DeepMind); Danilo Jimenez Rezende (DeepMind)

Towards Understanding Regularization in Batch Normalization Ping Luo (The Chinese University of Hong Kong); Xinjiang Wang (); Wenqi Shao (The Chinese University of HongKong)*; Zhanglin Peng (SenseTime)

Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam)

Meanfield theory of activation functions in Deep Neural Networks Mirco Milletari (Microsoft)*; Thiparat Chotibut (SUTD) ; Paolo E. Trevisanutto (National University of Singapore)

Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)*

Analysing the dynamics of online learning in over-parameterised two-layer neural networks Sebastian Goldt (Institut de Physique théorique, Paris)*; Madhu Advani (Harvard University); Andrew Saxe (University of Oxford); Florent Krzakala (École Normale Supérieure); Lenka Zdeborova (CEA Saclay)

A Halo Merger Tree Generation and Evaluation Framework Sandra Robles (Universidad Autónoma de Madrid); Jonathan Gómez (Pontificia Universidad Católica de Chile); Adín Ramírez Rivera (University of Campinas)*; Jenny Gonzáles (Pontificia Universidad Católica de Chile); Nelson Padilla (Pontificia Universidad Católica de Chile); Diego Dujovne (Universidad Diego Portales)

Learning Symmetries of Classical Integrable Systems Roberto Bondesan (Qualcomm AI Research)*, Austen Lamacraft (Cavendish Laboratory, University of Cambridge, UK)

Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)*; Shotaro Akaho (AIST); Shun-ichi Amari (RIKEN)

How Noise during Training Affects the Hessian Spectrum Mingwei Wei (Northwestern University); David Schwab (Facebook AI Research)*

A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Irivine)

Convergence Properties of Neural Networks on Separable Data Remi Tachet des Combes (Microsoft Research Montreal)*; Mohammad Pezeshki (Mila & University of Montreal); Samira Shabanian (Microsoft, Canada); Aaron Courville (MILA, Université de Montréal); Yoshua Bengio (Mila)

Universality and Capacity Metrics in Deep Neural Networks Michael Mahoney (University of California, Berkeley)*; Charles Martin (Calculation Consulting)

Asymptotics of Wide Networks from Feynman Diagrams Guy Gur-Ari (Google)*; Ethan Dyer (Google)

Deep Learning on the 2-Dimensional Ising Model to Extract the Crossover Region Nicholas Walker (Louisiana State Univ - Baton Rouge)*

Large Scale Structure of the Loss Landscape of Neural Networks Stanislav Fort (Stanford University)*; Stanislaw Jastrzebski (New York University)

Momentum Enables Large Batch Training Samuel L Smith (DeepMind)*; Erich Elsen (Google); Soham De (DeepMind)

Learning the Arrow of Time Nasim Rahaman (University of Heidelberg)*; Steffen Wolf (Heidelberg University); Anirudh Goyal (University of Montreal); Roman Remme (Heidelberg University); Yoshua Bengio (Mila)

Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore)*; Anupam Gupta (National University of Singapore)

A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off Yaniv Blumenfeld (Technion)*; Dar Gilboa (Columbia University); Daniel Soudry (Technion)

Rethinking Complexity in Deep Learning: A View from Function Space Aristide Baratin (Mila, Université de Montréal)*; Thomas George (MILA, Université de Montréal); César Laurent (Mila, Université de Montréal); Valentin Thomas (MILA); Guillaume Lajoie (Université de Montréal, Mila); Simon Lacoste-Julien (Mila, Université de Montréal)

The Deep Learning Limit: Negative Neural Network eigenvalues just noise? Diego Granziol (Oxford)*; Stefan Zohren (University of Oxford); Stephen Roberts (Oxford); Dmitry P Vetrov (Higher School of Economics); Andrew Gordon Wilson (Cornell University); Timur Garipov (Samsung AI Center in Moscow)

Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation Mariano Chouza (Tower Research Capital); Stephen Roberts (Oxford); Stefan Zohren (University of Oxford)*

Deep Learning for Inverse Problems Abhejit Rajagopal (University of California, Santa Barbara)*; Vincent R Radzicki (University of California, Santa Barbara)

Roman Novak, Maxime Gabella, Frederic Dreyer, Siavash Golkar, Anh Tong, Irina Higgins, Mirco Milletari, Joe Antognini, Sebastian Goldt, Adín Ramírez Rivera, Roberto Bondesan, Rio Karakida, Remi Tachet des Combes, Michael Mahoney, Nick Walker, Stanislav Fort, Samuel Smith, Rohan Ghosh, Aristide Baratin, Diego Granziol, Stephen Roberts, Dmitry Vetrov, Andrew Wilson, César Laurent, Valentin Thomas, Simon Lacoste-Julien, Dar Gilboa, Daniel Soudry, Anupam Gupta, Anirudh Goyal, Yoshua Bengio, Erich Elsen, Soham De, Stanislaw Jastrzebski, Charles H Martin, Samira Shabanian, Aaron Courville, Shorato Akaho, Lenka Zdeborova, Ethan Dyer, Maurice Weiler, Pim de Haan, Taco Cohen, Max Welling, Ping Luo, zhanglin peng, Nasim Rahaman, Loic Matthey, Danilo J. Rezende, Jaesik Choi, Kyle Cranmer, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Jeffrey Pennington, Greg Yang, Jiri Hron, Jascha Sohl-Dickstein, Guy Gur-Ari

Author Information

Jaehoon Lee (Google Brain)
Jeffrey Pennington (Google Brain)
Yasaman Bahri (Google Brain)
Max Welling (University of Amsterdam & Qualcomm)
Surya Ganguli (Stanford)
Joan Bruna (New York University)

More from the Same Authors