Timezone: »
Though the purview of physics is broad and includes many loosely connected subdisciplines, a unifying theme is the endeavor to provide concise, quantitative, and predictive descriptions of the often large and complex systems governing phenomena that occur in the natural world. While one could debate how closely deep learning is connected to the natural world, it is undeniably the case that deep learning systems are large and complex; as such, it is reasonable to consider whether the rich body of ideas and powerful tools from theoretical physicists could be harnessed to improve our understanding of deep learning. The goal of this workshop is to investigate this question by bringing together experts in theoretical physics and deep learning in order to stimulate interaction and to begin exploring how theoretical physics can shed light on the theory of deep learning.
We believe ICML is an appropriate venue for this gathering as members from both communities are frequently in attendance and because deep learning theory has emerged as a focus at the conference, both as an independent track in the main conference and in numerous workshops over the last few years. Moreover, the conference has enjoyed an increasing number of papers using physics tools and ideas to draw insights into deep learning.
Fri 8:30 a.m. - 8:40 a.m.
|
Opening Remarks
|
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna 🔗 |
Fri 8:40 a.m. - 9:10 a.m.
|
Linearized two-layers neural networks in high dimension
(
Invited talk
)
Speaker: Andrea Montanari (Stanford) Abstract: Abstract: We consider the problem of learning an unknown function f on the d-dimensional sphere with respect to the square loss, given i.i.d. samples (yi,xi) where xi is a feature vector uniformly distributed on the sphere and yi = f(x_i). We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: (RF) The random feature model of Rahimi-Recht; (NT) The neural tangent kernel model of Jacot-Gabriel-Hongler. Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and hence enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d. We prove that, if both d and N are large, the behavior of these models is instead remarkably simpler. If N is of smaller order than d^2, then RF performs no better than linear regression with respect to the raw features xi, and NT performs no better than linear regression with respect to degree-one and two monomials in the xi's. More generally, if N is of smaller order than d^{k+1} then RF fits at most a degree-k polynomial in the raw features, and NT fits at most a degree-(k+ 1) polynomial. We then focus on the case of quadratic functions, and N= O(d). We show that the gap in generalization error between fully trained neural networks and the linearized models is potentially unbounded. [based on joint work with Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz] |
Andrea Montanari 🔗 |
Fri 9:10 a.m. - 9:40 a.m.
|
Loss landscape and behaviour of algorithms in the spiked matrix-tensor model
(
Invited talk
)
Speaker: Lenka Zdeborova (CEA/SACLAY) Abstract: A key question of current interest is: How are properties of optimization and sampling algorithms influenced by the properties of the loss function in noisy high-dimensional non-convex settings? Answering this question for deep neural networks is a landmark goal of many ongoing works. In this talk I will answer this question in unprecedented detail for the spiked matrix-tensor model. Information theoretic limits, and Kac-Rice analysis of the loss landscapes, will be compared to the analytically studied performance of message passing algorithms, of the Langevin dynamics and of the gradient flow. Several rather non-intuitive results will be unveiled and explained. |
Lenka Zdeborova 🔗 |
Fri 9:40 a.m. - 10:20 a.m.
|
Poster spotlights
(
Spotlight
)
A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Irivine) Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam) Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore)*; Anupam Gupta (National University of Singapore) Towards a Definition of Disentangled Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (); Loic Matthey (DeepMind); Danilo Jimenez Rezende (Google DeepMind) Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha Sohl-Dickstein (Google Brain) Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)* Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)*; Shotaro Akaho (AIST); Shun-ichi Amari (RIKEN) Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)*; Duccio Pappadopulo (Bloomberg) Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford)*; Stefano Carrazza (University of Milan) |
Roman Novak · Frederic Dreyer · Siavash Golkar · Irina Higgins · Joe Antognini · Ryo Karakida · Rohan Ghosh 🔗 |
Fri 10:20 a.m. - 11:00 a.m.
|
Break and poster discussion
(
Break and Poster
)
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain); Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha Sohl-Dickstein (Google Brain) |
🔗 |
Fri 11:00 a.m. - 11:30 a.m.
|
On the Interplay between Physics and Deep Learning
(
Invited talk
)
Speaker: Kyle Cranmer (NYU) Abstract: The interplay between physics and deep learning is typically divided into two themes. The first is “physics for deep learning”, where techniques from physics are brought to bear on understanding dynamics of learning. The second is “deep learning for physics,” which focuses on application of deep learning techniques to physics problems. I will present a more nuanced view of this interplay with examples of how the structure of physics problems have inspired advances in deep learning and how it yields insights on topics such as inductive bias, interpretability, and causality. |
Kyle Cranmer 🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks
(
Invited talk
)
Speaker: Michael Mahoney (ICSI and Department of Statistics, University of California at Berkeley) Abstract: Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit self-regularization. For smaller and/or older DNNs, this implicit self-regularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of heavy-tailed self-regularization, similar to the self-organization seen in the statistical physics of disordered systems. This implicit self-regularization can depend strongly on the many knobs of the training process. In particular, by exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that---all else being equal---DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena. Coupled with work on energy landscapes and heavy-tailed spin glasses, it also suggests an explanation of why deep learning works. Joint work with Charles Martin of Calculation Consulting, Inc. |
Michael Mahoney 🔗 |
Fri 12:00 p.m. - 12:15 p.m.
|
Analyzing the dynamics of online learning in over-parameterized two-layer neural networks
(
Oral
)
|
Sebastian Goldt 🔗 |
Fri 12:15 p.m. - 12:30 p.m.
|
Convergence Properties of Neural Networks on Separable Data
(
Oral
)
|
Remi Tachet des Combes 🔗 |
Fri 12:30 p.m. - 2:00 p.m.
|
Lunch
|
🔗 |
Fri 2:00 p.m. - 2:30 p.m.
|
Is Optimization a sufficient language to understand Deep Learning?
(
Invited talk
)
Speaker: Sanjeev Arora (Princeton/IAS) Abstract: There is an old debate in neuroscience about whether or not learning has to boil down to optimizing a single cost function. This talk will suggest that even to understand mathematical properties of deep learning, we have to go beyond the conventional view of "optimizing a single cost function". The reason is that phenomena occur along the gradient descent trajectory that are not fully captured in the value of the cost function. I will illustrate briefly with three new results that involve such phenomena: (i) (joint work with Cohen, Hu, and Luo) How deep matrix factorization solves matrix completion better than classical algorithms https://arxiv.org/abs/1905.13655 (ii) (joint with Du, Hu, Li, Salakhutdinov, and Wang) How to compute (exactly) with an infinitely wide net ("mean field limit", in physics terms) https://arxiv.org/abs/1904.11955 (iii) (joint with Kuditipudi, Wang, Hu, Lee, Zhang, Li, Ge) Explaining mode-connectivity for real-life deep nets (the phenomenon that low-cost solutions found by gradient descent are interconnected in the parameter space via low-cost paths; see Garipov et al'18 and Draxler et al'18) |
Sanjeev Arora 🔗 |
Fri 2:30 p.m. - 2:45 p.m.
|
Towards Understanding Regularization in Batch Normalization
(
Oral
)
|
🔗 |
Fri 2:45 p.m. - 3:00 p.m.
|
How Noise during Training Affects the Hessian Spectrum
(
Oral
)
|
🔗 |
Fri 3:00 p.m. - 3:30 p.m.
|
Break and poster discussion
(
Break and Poster
)
|
🔗 |
Fri 3:30 p.m. - 4:00 p.m.
|
Understanding overparameterized neural networks
(
Invited talk
)
Speaker: Jascha Sohl-Dickstein (Google Brain) Abstract: As neural networks become highly overparameterized, their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a rapidly growing body of work which examines the learning dynamics and prior over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel, both before and after training; that the predictions of wide neural networks are linear in their parameters throughout training; and that this perspective enables analytic predictions for how trainability depends on hyperparameters and architecture. These results provide for surprising capabilities -- for instance, the evaluation of test set predictions which would come from an infinitely wide trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for future theoretical and practical understanding of deep learning. |
Jascha Sohl-Dickstein 🔗 |
Fri 4:00 p.m. - 4:15 p.m.
|
Asymptotics of Wide Networks from Feynman Diagrams
(
Oral
)
|
Guy Gur-Ari 🔗 |
Fri 4:15 p.m. - 4:30 p.m.
|
A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
(
Oral
)
|
Dar Gilboa 🔗 |
Fri 4:30 p.m. - 4:45 p.m.
|
Deep Learning on the 2-Dimensional Ising Model to Extract the Crossover Region
(
Oral
)
|
Nicholas Walker 🔗 |
Fri 4:45 p.m. - 5:00 p.m.
|
Learning the Arrow of Time
(
Oral
)
|
Nasim Rahaman 🔗 |
Fri 5:00 p.m. - 6:00 p.m.
|
Poster discussion
(
Poster Session
)
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak (Google Brain)*; Lechao Xiao (Google Brain); Jaehoon Lee (Google Brain); Yasaman Bahri (Google Brain); Greg Yang (Microsoft Research AI); Jiri Hron (University of Cambridge); Daniel Abolafia (Google Brain); Jeffrey Pennington (Google Brain); Jascha Sohl-Dickstein (Google Brain) Topology of Learning in Artificial Neural Networks Maxime Gabella (Magma Learning)* Jet grooming through reinforcement learning Frederic Dreyer (University of Oxford)*; Stefano Carrazza (University of Milan) Inferring the quantum density matrix with machine learning Kyle Cranmer (New York University); Siavash Golkar (NYU)*; Duccio Pappadopulo (Bloomberg) Backdrop: Stochastic Backpropagation Siavash Golkar (NYU)*; Kyle Cranmer (New York University) Explain pathology in Deep Gaussian Process using Chaos Theory Anh Tong (UNIST)*; Jaesik Choi (Ulsan National Institute of Science and Technology) Towards a Definition of Disentangled Representations Irina Higgins (DeepMind)*; David Amos (DeepMind); Sebastien Racaniere (DeepMind); David Pfau (DeepMind); Loic Matthey (DeepMind); Danilo Jimenez Rezende (DeepMind) Towards Understanding Regularization in Batch Normalization Ping Luo (The Chinese University of Hong Kong); Xinjiang Wang (); Wenqi Shao (The Chinese University of HongKong)*; Zhanglin Peng (SenseTime) Covariance in Physics and Convolutional Neural Networks Miranda Cheng (University of Amsterdam)*; Vassilis Anagiannis (University of Amsterdam); Maurice Weiler (University of Amsterdam); Pim de Haan (University of Amsterdam); Taco S. Cohen (Qualcomm AI Research); Max Welling (University of Amsterdam) Meanfield theory of activation functions in Deep Neural Networks Mirco Milletari (Microsoft)*; Thiparat Chotibut (SUTD) ; Paolo E. Trevisanutto (National University of Singapore) Finite size corrections for neural network Gaussian processes Joseph M Antognini (Whisper AI)* Analysing the dynamics of online learning in over-parameterised two-layer neural networks Sebastian Goldt (Institut de Physique théorique, Paris)*; Madhu Advani (Harvard University); Andrew Saxe (University of Oxford); Florent Krzakala (École Normale Supérieure); Lenka Zdeborova (CEA Saclay) A Halo Merger Tree Generation and Evaluation Framework Sandra Robles (Universidad Autónoma de Madrid); Jonathan Gómez (Pontificia Universidad Católica de Chile); Adín Ramírez Rivera (University of Campinas)*; Jenny Gonzáles (Pontificia Universidad Católica de Chile); Nelson Padilla (Pontificia Universidad Católica de Chile); Diego Dujovne (Universidad Diego Portales) Learning Symmetries of Classical Integrable Systems Roberto Bondesan (Qualcomm AI Research)*, Austen Lamacraft (Cavendish Laboratory, University of Cambridge, UK) Pathological Spectrum of the Fisher Information Matrix in Deep Neural Networks Ryo Karakida (National Institute of Advanced Industrial Science and Technology)*; Shotaro Akaho (AIST); Shun-ichi Amari (RIKEN) How Noise during Training Affects the Hessian Spectrum Mingwei Wei (Northwestern University); David Schwab (Facebook AI Research)* A Quantum Field Theory of Representation Learning Robert Bamler (University of California at Irvine)*; Stephan Mandt (University of California, Irivine) Convergence Properties of Neural Networks on Separable Data Remi Tachet des Combes (Microsoft Research Montreal)*; Mohammad Pezeshki (Mila & University of Montreal); Samira Shabanian (Microsoft, Canada); Aaron Courville (MILA, Université de Montréal); Yoshua Bengio (Mila) Universality and Capacity Metrics in Deep Neural Networks Michael Mahoney (University of California, Berkeley)*; Charles Martin (Calculation Consulting) Asymptotics of Wide Networks from Feynman Diagrams Guy Gur-Ari (Google)*; Ethan Dyer (Google) Deep Learning on the 2-Dimensional Ising Model to Extract the Crossover Region Nicholas Walker (Louisiana State Univ - Baton Rouge)* Large Scale Structure of the Loss Landscape of Neural Networks Stanislav Fort (Stanford University)*; Stanislaw Jastrzebski (New York University) Momentum Enables Large Batch Training Samuel L Smith (DeepMind)*; Erich Elsen (Google); Soham De (DeepMind) Learning the Arrow of Time Nasim Rahaman (University of Heidelberg)*; Steffen Wolf (Heidelberg University); Anirudh Goyal (University of Montreal); Roman Remme (Heidelberg University); Yoshua Bengio (Mila) Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks Rohan Ghosh (National University of Singapore)*; Anupam Gupta (National University of Singapore) A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off Yaniv Blumenfeld (Technion)*; Dar Gilboa (Columbia University); Daniel Soudry (Technion) Rethinking Complexity in Deep Learning: A View from Function Space Aristide Baratin (Mila, Université de Montréal)*; Thomas George (MILA, Université de Montréal); César Laurent (Mila, Université de Montréal); Valentin Thomas (MILA); Guillaume Lajoie (Université de Montréal, Mila); Simon Lacoste-Julien (Mila, Université de Montréal) The Deep Learning Limit: Negative Neural Network eigenvalues just noise? Diego Granziol (Oxford)*; Stefan Zohren (University of Oxford); Stephen Roberts (Oxford); Dmitry P Vetrov (Higher School of Economics); Andrew Gordon Wilson (Cornell University); Timur Garipov (Samsung AI Center in Moscow) Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation Mariano Chouza (Tower Research Capital); Stephen Roberts (Oxford); Stefan Zohren (University of Oxford)* Deep Learning for Inverse Problems Abhejit Rajagopal (University of California, Santa Barbara)*; Vincent R Radzicki (University of California, Santa Barbara) |
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari
|
Author Information
Jaehoon Lee (Google Brain)
Jeffrey Pennington (Google Brain)
Yasaman Bahri (Google Brain)
Max Welling (University of Amsterdam & Qualcomm)
Surya Ganguli (Stanford)
Joan Bruna (New York University)
More from the Same Authors
-
2022 : Path Integral Stochastic Optimal Control for Sampling Transition Paths »
Lars Holdijk · Yuanqi Du · Priyank Jaini · Ferry Hooft · Bernd Ensing · Max Welling -
2022 : Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training »
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite -
2023 : Reliable coarse-grained turbulent simulations through combined offline learning and neural emulation »
Chris Pedersen · Laure Zanna · Joan Bruna · Pavel Perezhogin -
2023 : A strong implicit bias in SGD dynamics towards much simpler subnetworks through stochastic collapse to invariant sets, Surya Ganguli »
Surya Ganguli -
2023 : Lie Point Symmetry and Physics Informed Networks »
Tara Akhound-Sadegh · Laurence Perreault-Levasseur · Johannes Brandstetter · Max Welling · Siamak Ravanbakhsh -
2023 Workshop: Structured Probabilistic Inference and Generative Modeling »
Dinghuai Zhang · Yuanqi Du · Chenlin Meng · Shawn Tan · Yingzhen Li · Max Welling · Yoshua Bengio -
2023 : Opening Remark »
Dinghuai Zhang · Yuanqi Du · Chenlin Meng · Shawn Tan · Yingzhen Li · Max Welling · Yoshua Bengio -
2023 Poster: Conditionally Strongly Log-Concave Generative Models »
Florentin Guth · Etienne Lempereur · Joan Bruna · Stéphane Mallat -
2023 Poster: Beyond the Edge of Stability via Two-step Gradient Updates »
Lei Chen · Joan Bruna -
2023 Poster: Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks »
T. Anderson Keller · Max Welling -
2023 Poster: Latent Traversals in Generative Models as Potential Flows »
Yue Song · T. Anderson Keller · Nicu Sebe · Max Welling -
2023 Poster: Geometric Clifford Algebra Networks »
David Ruhe · Jayesh K. Gupta · Steven De Keninck · Max Welling · Johannes Brandstetter -
2023 Poster: Second-order regression models exhibit progressive sharpening to the edge of stability »
Atish Agarwala · Fabian Pedregosa · Jeffrey Pennington -
2022 Poster: Lie Point Symmetry Data Augmentation for Neural PDE Solvers »
Johannes Brandstetter · Max Welling · Daniel Worrall -
2022 Spotlight: Lie Point Symmetry Data Augmentation for Neural PDE Solvers »
Johannes Brandstetter · Max Welling · Daniel Worrall -
2022 Poster: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Poster: Extended Unconstrained Features Model for Exploring Deep Neural Collapse »
Tom Tirer · Joan Bruna -
2022 Poster: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm »
Lechao Xiao · Jeffrey Pennington -
2022 Spotlight: Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling »
Jiri Hron · Roman Novak · Jeffrey Pennington · Jascha Sohl-Dickstein -
2022 Spotlight: Extended Unconstrained Features Model for Exploring Deep Neural Collapse »
Tom Tirer · Joan Bruna -
2022 Poster: Equivariant Diffusion for Molecule Generation in 3D »
Emiel Hoogeboom · Victor Garcia Satorras · Clément Vignac · Max Welling -
2022 Oral: Equivariant Diffusion for Molecule Generation in 3D »
Emiel Hoogeboom · Victor Garcia Satorras · Clément Vignac · Max Welling -
2021 Workshop: Over-parameterization: Pitfalls and Opportunities »
Yasaman Bahri · Quanquan Gu · Amin Karbasi · Hanie Sedghi -
2021 Workshop: ICML Workshop on Representation Learning for Finance and E-Commerce Applications »
Senthil Kumar · Sameena Shah · Joan Bruna · Tom Goldstein · Erik Mueller · Oleg Rokhlenko · Hongxia Yang · Jianpeng Xu · Oluwatobi O Olabiyi · Charese Smiley · C. Bayan Bruss · Saurabh H Nagrecha · Svitlana Vyetrenko -
2021 Test Of Time: Bayesian Learning via Stochastic Gradient Langevin Dynamics »
Yee Teh · Max Welling -
2021 Test Of Time: Test of Time Award »
Max Welling · Max Welling -
2021 Poster: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Poster: A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions »
Gabriel Mel · Surya Ganguli -
2021 Spotlight: A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions »
Gabriel Mel · Surya Ganguli -
2021 Oral: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Poster: On Energy-Based Models with Overparametrized Shallow Neural Networks »
Carles Domingo-Enrich · Alberto Bietti · Eric Vanden-Eijnden · Joan Bruna -
2021 Poster: The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning »
Roberto Bondesan · Max Welling -
2021 Spotlight: The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning »
Roberto Bondesan · Max Welling -
2021 Oral: On Energy-Based Models with Overparametrized Shallow Neural Networks »
Carles Domingo-Enrich · Alberto Bietti · Eric Vanden-Eijnden · Joan Bruna -
2021 Poster: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Oral: A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups »
Marc Finzi · Max Welling · Andrew Wilson -
2021 Poster: Federated Learning of User Verification Models Without Sharing Embeddings »
Hossein Hosseini · Hyunsin Park · Sungrack Yun · Christos Louizos · Joseph B Soriaga · Max Welling -
2021 Poster: E(n) Equivariant Graph Neural Networks »
Victor Garcia Satorras · Emiel Hoogeboom · Max Welling -
2021 Poster: Offline Contextual Bandits with Overparameterized Models »
David Brandfonbrener · William Whitney · Rajesh Ranganath · Joan Bruna -
2021 Poster: A Functional Perspective on Learning Symmetric Functions with Neural Networks »
Aaron Zweig · Joan Bruna -
2021 Poster: Self Normalizing Flows »
T. Anderson Keller · Jorn Peters · Priyank Jaini · Emiel Hoogeboom · Patrick Forré · Max Welling -
2021 Spotlight: E(n) Equivariant Graph Neural Networks »
Victor Garcia Satorras · Emiel Hoogeboom · Max Welling -
2021 Spotlight: Federated Learning of User Verification Models Without Sharing Embeddings »
Hossein Hosseini · Hyunsin Park · Sungrack Yun · Christos Louizos · Joseph B Soriaga · Max Welling -
2021 Spotlight: Self Normalizing Flows »
T. Anderson Keller · Jorn Peters · Priyank Jaini · Emiel Hoogeboom · Patrick Forré · Max Welling -
2021 Spotlight: A Functional Perspective on Learning Symmetric Functions with Neural Networks »
Aaron Zweig · Joan Bruna -
2021 Spotlight: Offline Contextual Bandits with Overparameterized Models »
David Brandfonbrener · William Whitney · Rajesh Ranganath · Joan Bruna -
2021 : The Mystery of Generalization: Why Does Deep Learning Work? »
Jeffrey Pennington -
2021 Tutorial: Random Matrix Theory and ML (RMT+ML) »
Fabian Pedregosa · Courtney Paquette · Thomas Trogdon · Jeffrey Pennington -
2020 : Invited talk 1: Unifying VAEs and Flows »
Max Welling -
2020 Poster: Extra-gradient with player sampling for faster convergence in n-player games »
Samy Jelassi · Carles Domingo-Enrich · Damien Scieur · Arthur Mensch · Joan Bruna -
2020 Poster: Involutive MCMC: a Unifying Framework »
Kirill Neklyudov · Max Welling · Evgenii Egorov · Dmitry Vetrov -
2020 Poster: The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization »
Ben Adlam · Jeffrey Pennington -
2020 Poster: Infinite attention: NNGP and NTK for deep attention networks »
Jiri Hron · Yasaman Bahri · Jascha Sohl-Dickstein · Roman Novak -
2020 Poster: Disentangling Trainability and Generalization in Deep Neural Networks »
Lechao Xiao · Jeffrey Pennington · Samuel Schoenholz -
2020 Poster: Two Routes to Scalable Credit Assignment without Weight Symmetry »
Daniel Kunin · Aran Nayebi · Javier Sagastuy-Brena · Surya Ganguli · Jonathan Bloom · Daniel Yamins -
2019 Workshop: Learning and Reasoning with Graph-Structured Representations »
Ethan Fetaya · Zhiting Hu · Thomas Kipf · Yujia Li · Xiaodan Liang · Renjie Liao · Raquel Urtasun · Hao Wang · Max Welling · Eric Xing · Richard Zemel -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Panel Discussion (moderator: Tom Dietterich) »
Max Welling · Kilian Weinberger · Terrance Boult · Dawn Song · Thomas Dietterich -
2019 : Keynote by Max Welling: A Nonparametric Bayesian Approach to Deep Learning (without GPs) »
Max Welling -
2019 Workshop: Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR) »
Sujith Ravi · Zornitsa Kozareva · Lixin Fan · Max Welling · Yurong Chen · Werner Bailer · Brian Kulis · Haoji Hu · Jonathan Dekhtiar · Yingyan Lin · Diana Marculescu -
2019 : Opening Remarks »
Jaehoon Lee · Jeffrey Pennington · Yasaman Bahri · Max Welling · Surya Ganguli · Joan Bruna -
2019 Poster: Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement »
Wouter Kool · Herke van Hoof · Max Welling -
2019 Poster: Neuron birth-death dynamics accelerates gradient descent and converges asymptotically »
Grant Rotskoff · Samy Jelassi · Joan Bruna · Eric Vanden-Eijnden -
2019 Oral: Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement »
Wouter Kool · Herke van Hoof · Max Welling -
2019 Oral: Neuron birth-death dynamics accelerates gradient descent and converges asymptotically »
Grant Rotskoff · Samy Jelassi · Joan Bruna · Eric Vanden-Eijnden -
2019 Poster: Approximating Orthogonal Matrices with Effective Givens Factorization »
Thomas Frerix · Joan Bruna -
2019 Poster: Emerging Convolutions for Generative Normalizing Flows »
Emiel Hoogeboom · Rianne Van den Berg · Max Welling -
2019 Oral: Approximating Orthogonal Matrices with Effective Givens Factorization »
Thomas Frerix · Joan Bruna -
2019 Oral: Emerging Convolutions for Generative Normalizing Flows »
Emiel Hoogeboom · Rianne Van den Berg · Max Welling -
2019 Poster: Gauge Equivariant Convolutional Networks and the Icosahedral CNN »
Taco Cohen · Maurice Weiler · Berkay Kicanaoglu · Max Welling -
2019 Oral: Gauge Equivariant Convolutional Networks and the Icosahedral CNN »
Taco Cohen · Maurice Weiler · Berkay Kicanaoglu · Max Welling -
2018 Poster: Attention-based Deep Multiple Instance Learning »
Maximilian Ilse · Jakub Tomczak · Max Welling -
2018 Oral: Attention-based Deep Multiple Instance Learning »
Maximilian Ilse · Jakub Tomczak · Max Welling -
2018 Poster: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Oral: Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks »
Minmin Chen · Jeffrey Pennington · Samuel Schoenholz -
2018 Invited Talk: Intelligence per Kilowatthour »
Max Welling -
2018 Poster: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · Kuan-Chieh Wang · Max Welling · Richard Zemel -
2018 Poster: BOCK : Bayesian Optimization with Cylindrical Kernels »
ChangYong Oh · Efstratios Gavves · Max Welling -
2018 Poster: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2018 Oral: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · Kuan-Chieh Wang · Max Welling · Richard Zemel -
2018 Oral: Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks »
Lechao Xiao · Yasaman Bahri · Jascha Sohl-Dickstein · Samuel Schoenholz · Jeffrey Pennington -
2018 Oral: BOCK : Bayesian Optimization with Cylindrical Kernels »
ChangYong Oh · Efstratios Gavves · Max Welling -
2017 Poster: Multiplicative Normalizing Flows for Variational Bayesian Neural Networks »
Christos Louizos · Max Welling -
2017 Talk: Multiplicative Normalizing Flows for Variational Bayesian Neural Networks »
Christos Louizos · Max Welling -
2017 Poster: Continual Learning Through Synaptic Intelligence »
Friedemann Zenke · Ben Poole · Surya Ganguli -
2017 Poster: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri -
2017 Talk: Continual Learning Through Synaptic Intelligence »
Friedemann Zenke · Ben Poole · Surya Ganguli -
2017 Poster: On the Expressive Power of Deep Neural Networks »
Maithra Raghu · Ben Poole · Surya Ganguli · Jon Kleinberg · Jascha Sohl-Dickstein -
2017 Talk: On the Expressive Power of Deep Neural Networks »
Maithra Raghu · Ben Poole · Surya Ganguli · Jon Kleinberg · Jascha Sohl-Dickstein -
2017 Talk: Geometry of Neural Network Loss Surfaces via Random Matrix Theory »
Jeffrey Pennington · Yasaman Bahri