Timezone: »
Though the purview of physics is broad and includes many loosely connected subdisciplines, a unifying theme is the endeavor to provide concise, quantitative, and predictive descriptions of the often large and complex systems governing phenomena that occur in the natural world. While one could debate how closely deep learning is connected to the natural world, it is undeniably the case that deep learning systems are large and complex; as such, it is reasonable to consider whether the rich body of ideas and powerful tools from theoretical physicists could be harnessed to improve our understanding of deep learning. The goal of this workshop is to investigate this question by bringing together experts in theoretical physics and deep learning in order to stimulate interaction and to begin exploring how theoretical physics can shed light on the theory of deep learning.
We believe ICML is an appropriate venue for this gathering as members from both communities are frequently in attendance and because deep learning theory has emerged as a focus at the conference, both as an independent track in the main conference and in numerous workshops over the last few years. Moreover, the conference has enjoyed an increasing number of papers using physics tools and ideas to draw insights into deep learning.
Fri 8:30 a.m.  8:40 a.m.

Opening Remarks
[ Video]

Jaehoon Lee, Jeffrey Pennington, Yasaman Bahri, Max Welling, Surya Ganguli, Joan Bruna 
Fri 8:40 a.m.  9:10 a.m.

Linearized twolayers neural networks in high dimension
(Invited talk)
»
[ Video]
Speaker: Andrea Montanari (Stanford) Abstract: Abstract: We consider the problem of learning an unknown function f on the ddimensional sphere with respect to the square loss, given i.i.d. samples (yi,xi) where xi is a feature vector uniformly distributed on the sphere and yi = f(x_i). We study two popular classes of models that can be regarded as linearizations of twolayers neural networks around a random initialization: (RF) The random feature model of RahimiRecht; (NT) The neural tangent kernel model of JacotGabrielHongler. Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and hence enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d. We prove that, if both d and N are large, the behavior of these models is instead remarkably simpler. If N is of smaller order than d^2, then RF performs no better than linear regression with respect to the raw features xi, and NT performs no better than linear regression with respect to degreeone and two monomials in the xi's. More generally, if N is of smaller order than d^{k+1} then RF fits at most a degreek polynomial in the raw features, and NT fits at most a degree(k+ 1) polynomial. We then focus on the case of quadratic functions, and N= O(d). We show that the gap in generalization error between fully trained neural networks and the linearized models is potentially unbounded. [based on joint work with Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz] 
Andrea Montanari 
Fri 9:10 a.m.  9:40 a.m.

Loss landscape and behaviour of algorithms in the spiked matrixtensor model
(Invited talk)
»
[ Video]
Speaker: Lenka Zdeborova (CEA/SACLAY) Abstract: A key question of current interest is: How are properties of optimization and sampling algorithms influenced by the properties of the loss function in noisy highdimensional nonconvex settings? Answering this question for deep neural networks is a landmark goal of many ongoing works. In this talk I will answer this question in unprecedented detail for the spiked matrixtensor model. Information theoretic limits, and KacRice analysis of the loss landscapes, will be compared to the analytically studied performance of message passing algorithms, of the Langevin dynamics and of the gradient flow. Several rather nonintuitive results will be unveiled and explained. 
Lenka Zdeborova 
Fri 9:40 a.m.  10:20 a.m.

Poster spotlights
(Spotlight)
»
A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Irivine) Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam) Scale Steerable Filters for Locally ScaleInvariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore)*; Anupam Gupta (National University of Singapore) Towards a Definition of Disentangled Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (); Loic Matthey (DeepMind); Danilo Jimenez Rezende (Google DeepMind) Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha SohlDickstein (Google Brain) Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)* Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)*; Shotaro Akaho (AIST); Shunichi Amari (RIKEN) Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)*; Duccio Pappadopulo (Bloomberg) Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford)*; Stefano Carrazza (University of Milan) 
Roman Novak, Frederic Dreyer, Siavash Golkar, Irina Higgins, Joe Antognini, Rio Karakida, Rohan Ghosh 
Fri 10:20 a.m.  11:00 a.m.

Break and poster discussion
(Break and Poster)
»
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain); Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha SohlDickstein (Google Brain) 

Fri 11:00 a.m.  11:30 a.m.

On the Interplay between Physics and Deep Learning
(Invited talk)
»
[ Video]
Speaker: Kyle Cranmer (NYU) Abstract: The interplay between physics and deep learning is typically divided into two themes. The first is “physics for deep learning”, where techniques from physics are brought to bear on understanding dynamics of learning. The second is “deep learning for physics,” which focuses on application of deep learning techniques to physics problems. I will present a more nuanced view of this interplay with examples of how the structure of physics problems have inspired advances in deep learning and how it yields insights on topics such as inductive bias, interpretability, and causality. 
Kyle Cranmer 
Fri 11:30 a.m.  12:00 p.m.

Why Deep Learning Works: Traditional and HeavyTailed Implicit SelfRegularization in Deep Neural Networks
(Invited talk)
»
[ Video]
Speaker: Michael Mahoney (ICSI and Department of Statistics, University of California at Berkeley) Abstract: Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pretrained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of selfregularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionallyregularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of HeavyTailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit selfregularization. For smaller and/or older DNNs, this implicit selfregularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For stateoftheart DNNs, however, we identify a novel form of heavytailed selfregularization, similar to the selforganization seen in the statistical physics of disordered systems. This implicit selfregularization can depend strongly on the many knobs of the training process. In particular, by exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates thatall else being equalDNN optimization with larger batch sizes leads to lesswell implicitlyregularized models, and it provides an explanation for the generalization gap phenomena. Coupled with work on energy landscapes and heavytailed spin glasses, it also suggests an explanation of why deep learning works. Joint work with Charles Martin of Calculation Consulting, Inc. 
Michael Mahoney 
Fri 12:00 p.m.  12:15 p.m.

Analyzing the dynamics of online learning in overparameterized twolayer neural networks
(Oral)
[ Video]

Sebastian Goldt 
Fri 12:15 p.m.  12:30 p.m.

Convergence Properties of Neural Networks on Separable Data
(Oral)
[ Video]

Remi Tachet des Combes 
Fri 12:30 p.m.  2:00 p.m.

Lunch
(Break)


Fri 2:00 p.m.  2:30 p.m.

Is Optimization a sufficient language to understand Deep Learning?
(Invited talk)
»
[ Video]
Speaker: Sanjeev Arora (Princeton/IAS) Abstract: There is an old debate in neuroscience about whether or not learning has to boil down to optimizing a single cost function. This talk will suggest that even to understand mathematical properties of deep learning, we have to go beyond the conventional view of "optimizing a single cost function". The reason is that phenomena occur along the gradient descent trajectory that are not fully captured in the value of the cost function. I will illustrate briefly with three new results that involve such phenomena: (i) (joint work with Cohen, Hu, and Luo) How deep matrix factorization solves matrix completion better than classical algorithms https://arxiv.org/abs/1905.13655 (ii) (joint with Du, Hu, Li, Salakhutdinov, and Wang) How to compute (exactly) with an infinitely wide net ("mean field limit", in physics terms) https://arxiv.org/abs/1904.11955 (iii) (joint with Kuditipudi, Wang, Hu, Lee, Zhang, Li, Ge) Explaining modeconnectivity for reallife deep nets (the phenomenon that lowcost solutions found by gradient descent are interconnected in the parameter space via lowcost paths; see Garipov et al'18 and Draxler et al'18) 
Sanjeev Arora 
Fri 2:30 p.m.  2:45 p.m.

Towards Understanding Regularization in Batch Normalization
(Oral)
[ Video]


Fri 2:45 p.m.  3:00 p.m.

How Noise during Training Affects the Hessian Spectrum
(Oral)
[ Video]


Fri 3:00 p.m.  3:30 p.m.

Break and poster discussion
(Break and Poster)


Fri 3:30 p.m.  4:00 p.m.

Understanding overparameterized neural networks
(Invited talk)
»
[ Video]
Speaker: Jascha SohlDickstein (Google Brain) Abstract: As neural networks become highly overparameterized, their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a rapidly growing body of work which examines the learning dynamics and prior over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel, both before and after training; that the predictions of wide neural networks are linear in their parameters throughout training; and that this perspective enables analytic predictions for how trainability depends on hyperparameters and architecture. These results provide for surprising capabilities  for instance, the evaluation of test set predictions which would come from an infinitely wide trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for future theoretical and practical understanding of deep learning. 
Jascha SohlDickstein 
Fri 4:00 p.m.  4:15 p.m.

Asymptotics of Wide Networks from Feynman Diagrams
(Oral)
[ Video]

Guy GurAri 
Fri 4:15 p.m.  4:30 p.m.

A Mean Field Theory of Quantized Deep Networks: The QuantizationDepth TradeOff
(Oral)
[ Video]

Dar Gilboa 
Fri 4:30 p.m.  4:45 p.m.

Deep Learning on the 2Dimensional Ising Model to Extract the Crossover Region
(Oral)
[ Video]

Nick Walker 
Fri 4:45 p.m.  5:00 p.m.

Learning the Arrow of Time
(Oral)
[ Video]

Nasim Rahaman 
Fri 5:00 p.m.  6:00 p.m.

Poster discussion
(Poster Session)
»
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha SohlDickstein (Google Brain) Topology of Learning in Artificial Neural Networks Maxime Gabella (Magma Learning)* Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford)*; Stefano Carrazza (University of Milan) Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)*; Duccio Pappadopulo (Bloomberg) Backdrop: Stochastic Backpropagation Siavash Golkar (NYU)*; Kyle Cranmer (New York University) Explain pathology in Deep Gaussian Process using Chaos Theory Anh Tong (UNIST)*; Jaesik Choi (Ulsan National Institute of Science and Technology) Towards a Definition of Disentangled Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (DeepMind); Loic Matthey (DeepMind); Danilo Jimenez Rezende (DeepMind) Towards Understanding Regularization in Batch Normalization Ping Luo (The Chinese University of Hong Kong); Xinjiang Wang (); Wenqi Shao (The Chinese University of HongKong)*; Zhanglin Peng (SenseTime) Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam) Meanfield theory of activation functions in Deep Neural Networks Mirco Milletari (Microsoft)*; Thiparat Chotibut (SUTD) ; Paolo E. Trevisanutto (National University of Singapore) Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)* Analysing the dynamics of online learning in overparameterised twolayer neural networks Sebastian Goldt (Institut de Physique théorique, Paris)*; Madhu Advani (Harvard University); Andrew Saxe (University of Oxford); Florent Krzakala (École Normale Supérieure); Lenka Zdeborova (CEA Saclay) A Halo Merger Tree Generation and Evaluation Framework Sandra Robles (Universidad Autónoma de Madrid); Jonathan Gómez (Pontificia Universidad Católica de Chile); Adín Ramírez Rivera (University of Campinas)*; Jenny Gonzáles (Pontificia Universidad Católica de Chile); Nelson Padilla (Pontificia Universidad Católica de Chile); Diego Dujovne (Universidad Diego Portales) Learning Symmetries of Classical Integrable Systems Roberto Bondesan (Qualcomm AI Research)*, Austen Lamacraft (Cavendish Laboratory, University of Cambridge, UK) Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)*; Shotaro Akaho (AIST); Shunichi Amari (RIKEN) How Noise during Training Affects the Hessian Spectrum Mingwei Wei (Northwestern University); David Schwab (Facebook AI Research)* A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Irivine) Convergence Properties of Neural Networks on Separable Data Remi Tachet des Combes (Microsoft Research Montreal)*; Mohammad Pezeshki (Mila & University of Montreal); Samira Shabanian (Microsoft, Canada); Aaron Courville (MILA, Université de Montréal); Yoshua Bengio (Mila) Universality and Capacity Metrics in Deep Neural Networks Michael Mahoney (University of California, Berkeley)*; Charles Martin (Calculation Consulting) Asymptotics of Wide Networks from Feynman Diagrams Guy GurAri (Google)*; Ethan Dyer (Google) Deep Learning on the 2Dimensional Ising Model to Extract the Crossover Region Nicholas Walker (Louisiana State Univ  Baton Rouge)* Large Scale Structure of the Loss Landscape of Neural Networks Stanislav Fort (Stanford University)*; Stanislaw Jastrzebski (New York University) Momentum Enables Large Batch Training Samuel L Smith (DeepMind)*; Erich Elsen (Google); Soham De (DeepMind) Learning the Arrow of Time Nasim Rahaman (University of Heidelberg)*; Steffen Wolf (Heidelberg University); Anirudh Goyal (University of Montreal); Roman Remme (Heidelberg University); Yoshua Bengio (Mila) Scale Steerable Filters for Locally ScaleInvariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore)*; Anupam Gupta (National University of Singapore) A Mean Field Theory of Quantized Deep Networks: The QuantizationDepth TradeOff Yaniv Blumenfeld (Technion)*; Dar Gilboa (Columbia University); Daniel Soudry (Technion) Rethinking Complexity in Deep Learning: A View from Function Space Aristide Baratin (Mila, Université de Montréal)*; Thomas George (MILA, Université de Montréal); César Laurent (Mila, Université de Montréal); Valentin Thomas (MILA); Guillaume Lajoie (Université de Montréal, Mila); Simon LacosteJulien (Mila, Université de Montréal) The Deep Learning Limit: Negative Neural Network eigenvalues just noise? Diego Granziol (Oxford)*; Stefan Zohren (University of Oxford); Stephen Roberts (Oxford); Dmitry P Vetrov (Higher School of Economics); Andrew Gordon Wilson (Cornell University); Timur Garipov (Samsung AI Center in Moscow) Gradient descent in Gaussian random fields as a toy model for highdimensional optimisation Mariano Chouza (Tower Research Capital); Stephen Roberts (Oxford); Stefan Zohren (University of Oxford)* Deep Learning for Inverse Problems Abhejit Rajagopal (University of California, Santa Barbara)*; Vincent R Radzicki (University of California, Santa Barbara) 
Roman Novak, Maxime Gabella, Frederic Dreyer, Siavash Golkar, Anh Tong, Irina Higgins, Mirco Milletari, Joe Antognini, Sebastian Goldt, Adín Ramírez Rivera, Roberto Bondesan, Rio Karakida, Remi Tachet des Combes, Michael Mahoney, Nick Walker, Stanislav Fort, Samuel Smith, Rohan Ghosh, Aristide Baratin, Diego Granziol, Stephen Roberts, Dmitry Vetrov, Andrew Wilson, César Laurent, Valentin Thomas, Simon LacosteJulien, Dar Gilboa, Daniel Soudry, Anupam Gupta, Anirudh Goyal, Yoshua Bengio, Erich Elsen, Soham De, Stanislaw Jastrzebski, Charles H Martin, Samira Shabanian, Aaron Courville, Shorato Akaho, Lenka Zdeborova, Ethan Dyer, Maurice Weiler, Pim de Haan, Taco Cohen, Max Welling, Ping Luo, zhanglin peng, Nasim Rahaman, Loic Matthey, Danilo J. Rezende, Jaesik Choi, Kyle Cranmer, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Jeffrey Pennington, Greg Yang, Jiri Hron, Jascha SohlDickstein, Guy GurAri

Author Information
Jaehoon Lee (Google Brain)
Jeffrey Pennington (Google Brain)
Yasaman Bahri (Google Brain)
Max Welling (University of Amsterdam & Qualcomm)
Surya Ganguli (Stanford)
Joan Bruna (New York University)
More from the Same Authors

2021 Workshop: Overparameterization: Pitfalls and Opportunities »
Yasaman Bahri · Quanquan Gu · Amin Karbasi · Hanie Sedghi 
2021 Workshop: ICML Workshop on Representation Learning for Finance and ECommerce Applications »
Senthil Kumar · Sameena Shah · Joan Bruna · Tom Goldstein · Erik Mueller · Oleg Rokhlenko · Hongxia Yang · Jianpeng Xu · Oluwatobi O Olabiyi · Charese Smiley · C. Bayan Bruss · Saurabh H Nagrecha · Svitlana Vyetrenko 
2021 Test Of Time: Bayesian Learning via Stochastic Gradient Langevin Dynamics »
Yee Teh · Max Welling 
2021 Test Of Time: Test of Time Award »
Max Welling · Max Welling 
2021 Poster: Understanding selfsupervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli 
2021 Poster: A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions »
Gabriel Mel · Surya Ganguli 
2021 Spotlight: A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions »
Gabriel Mel · Surya Ganguli 
2021 Oral: Understanding selfsupervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli 
2021 Poster: On EnergyBased Models with Overparametrized Shallow Neural Networks »
Carles DomingoEnrich · Alberto Bietti · Eric VandenEijnden · Joan Bruna 
2021 Poster: The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning »
Roberto Bondesan · Max Welling 
2021 Spotlight: The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning »
Roberto Bondesan · Max Welling 
2021 Oral: On EnergyBased Models with Overparametrized Shallow Neural Networks »
Carles DomingoEnrich · Alberto Bietti · Eric VandenEijnden · Joan Bruna 
2021 Poster: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson 
2021 Oral: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson 
2021 Poster: Federated Learning of User Verification Models Without Sharing Embeddings »
Hossein Hosseini · Hyunsin Park · Sungrack Yun · Christos Louizos · Joseph B Soriaga · Max Welling 
2021 Poster: E(n) Equivariant Graph Neural Networks »
Víctor Garcia Satorras · Emiel Hoogeboom · Max Welling 
2021 Poster: Offline Contextual Bandits with Overparameterized Models »
David Brandfonbrener · William Whitney · Rajesh Ranganath · Joan Bruna 
2021 Poster: A Functional Perspective on Learning Symmetric Functions with Neural Networks »
Aaron Zweig · Joan Bruna 
2021 Poster: Self Normalizing Flows »
T. Anderson Keller · Jorn Peters · Priyank Jaini · Emiel Hoogeboom · Patrick Forré · Max Welling 
2021 Spotlight: E(n) Equivariant Graph Neural Networks »
Víctor Garcia Satorras · Emiel Hoogeboom · Max Welling 
2021 Spotlight: Federated Learning of User Verification Models Without Sharing Embeddings »
Hossein Hosseini · Hyunsin Park · Sungrack Yun · Christos Louizos · Joseph B Soriaga · Max Welling 
2021 Spotlight: Self Normalizing Flows »
T. Anderson Keller · Jorn Peters · Priyank Jaini · Emiel Hoogeboom · Patrick Forré · Max Welling 
2021 Spotlight: A Functional Perspective on Learning Symmetric Functions with Neural Networks »
Aaron Zweig · Joan Bruna 
2021 Spotlight: Offline Contextual Bandits with Overparameterized Models »
David Brandfonbrener · William Whitney · Rajesh Ranganath · Joan Bruna 
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington 
2020 Poster: Extragradient with player sampling for faster convergence in nplayer games »
Samy Jelassi · Carles DomingoEnrich · Damien Scieur · Arthur Mensch · Joan Bruna 
2020 Poster: Involutive MCMC: a Unifying Framework »
Kirill Neklyudov · Max Welling · Evgenii Egorov · Dmitry Vetrov 
2020 Poster: The Neural Tangent Kernel in High Dimensions: Triple Descent and a MultiScale Theory of Generalization »
Ben Adlam · Jeffrey Pennington 
2020 Poster: Infinite attention: NNGP and NTK for deep attention networks »
Jiri Hron · Yasaman Bahri · Jascha SohlDickstein · Roman Novak 
2020 Poster: Disentangling Trainability and Generalization in Deep Neural Networks »
Lechao Xiao · Jeffrey Pennington · Samuel Schoenholz 
2020 Poster: Two Routes to Scalable Credit Assignment without Weight Symmetry »
Daniel Kunin · Aran Nayebi · Javier SagastuyBrena · Surya Ganguli · Jonathan Bloom · Daniel Yamins 
2019 Workshop: Learning and Reasoning with GraphStructured Representations »
Ethan Fetaya · Zhiting Hu · Thomas Kipf · Yujia Li · Xiaodan Liang · Renjie Liao · Raquel Urtasun · Hao Wang · Max Welling · Eric Xing · Richard Zemel 
2019 Workshop: Joint Workshop on OnDevice Machine Learning & Compact Deep Neural Network Representations (ODMLCDNNR) »
Sujith Ravi · Zornitsa Kozareva · Lixin Fan · Max Welling · Yurong Chen · Werner Bailer · Brian Kulis · Haoji Hu · Jonathan Dekhtiar · Yingyan Lin · Diana Marculescu 
2019 Poster: Stochastic Beams and Where To Find Them: The GumbelTopk Trick for Sampling Sequences Without Replacement »
Wouter Kool · Herke van Hoof · Max Welling 
2019 Poster: Neuron birthdeath dynamics accelerates gradient descent and converges asymptotically »
Grant Rotskoff · Samy Jelassi · Joan Bruna · Eric VandenEijnden 
2019 Oral: Stochastic Beams and Where To Find Them: The GumbelTopk Trick for Sampling Sequences Without Replacement »
Wouter Kool · Herke van Hoof · Max Welling 
2019 Oral: Neuron birthdeath dynamics accelerates gradient descent and converges asymptotically »
Grant Rotskoff · Samy Jelassi · Joan Bruna · Eric VandenEijnden 
2019 Poster: Approximating Orthogonal Matrices with Effective Givens Factorization »
Thomas Frerix · Joan Bruna 
2019 Poster: Emerging Convolutions for Generative Normalizing Flows »
Emiel Hoogeboom · Rianne Van den Berg · Max Welling 
2019 Oral: Approximating Orthogonal Matrices with Effective Givens Factorization »
Thomas Frerix · Joan Bruna 
2019 Oral: Emerging Convolutions for Generative Normalizing Flows »
Emiel Hoogeboom · Rianne Van den Berg · Max Welling 
2019 Poster: Gauge Equivariant Convolutional Networks and the Icosahedral CNN »
Taco Cohen · Maurice Weiler · Berkay Kicanaoglu · Max Welling 
2019 Oral: Gauge Equivariant Convolutional Networks and the Icosahedral CNN »
Taco Cohen · Maurice Weiler · Berkay Kicanaoglu · Max Welling 
2018 Poster: Attentionbased Deep Multiple Instance Learning »
Maximilian Ilse · Jakub Tomczak · Max Welling 
2018 Oral: Attentionbased Deep Multiple Instance Learning »
Maximilian Ilse · Jakub Tomczak · Max Welling 
2018 Poster: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz 
2018 Oral: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz 
2018 Invited Talk: Intelligence per Kilowatthour »
Max Welling 
2018 Poster: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · KuanChieh Wang · Max Welling · Richard Zemel 
2018 Poster: BOCK : Bayesian Optimization with Cylindrical Kernels »
ChangYong Oh · Efstratios Gavves · Max Welling 
2018 Poster: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha SohlDickstein · Samuel Schoenholz · Jeffrey Pennington 
2018 Oral: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · KuanChieh Wang · Max Welling · Richard Zemel 
2018 Oral: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha SohlDickstein · Samuel Schoenholz · Jeffrey Pennington 
2018 Oral: BOCK : Bayesian Optimization with Cylindrical Kernels »
ChangYong Oh · Efstratios Gavves · Max Welling 
2017 Poster: Multiplicative Normalizing Flows for Variational Bayesian Neural Networks »
Christos Louizos · Max Welling 
2017 Talk: Multiplicative Normalizing Flows for Variational Bayesian Neural Networks »
Christos Louizos · Max Welling 
2017 Poster: Continual Learning Through Synaptic Intelligence »
Friedemann Zenke · Ben Poole · Surya Ganguli 
2017 Poster: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri 
2017 Talk: Continual Learning Through Synaptic Intelligence »
Friedemann Zenke · Ben Poole · Surya Ganguli 
2017 Poster: On the Expressive Power of Deep Neural Networks »
Maithra Raghu · Ben Poole · Surya Ganguli · Jon Kleinberg · Jascha SohlDickstein 
2017 Talk: On the Expressive Power of Deep Neural Networks »
Maithra Raghu · Ben Poole · Surya Ganguli · Jon Kleinberg · Jascha SohlDickstein 
2017 Talk: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri