Timezone: »

Workshop
Understanding and Improving Generalization in Deep Learning
Dilip Krishnan · Hossein Mobahi · Behnam Neyshabur · Behnam Neyshabur · Peter Bartlett · Dawn Song · Nati Srebro

Fri Jun 14 08:30 AM -- 06:00 PM (PDT) @ Grand Ballroom A

The 1st workshop on Generalization in Deep Networks: Theory and Practice will be held as part of ICML 2019. Generalization is one of the fundamental problems of machine learning, and increasingly important as deep networks make their presence felt in domains with big, small, noisy or skewed data. This workshop will consider generalization from both theoretical and practical perspectives. We welcome contributions from paradigms such as representation learning, transfer learning and reinforcement learning. The workshop invites researchers to submit working papers in the following research areas:

Implicit regularization: the role of optimization algorithms in generalization
Explicit regularization methods
Network architecture choices that improve generalization
Empirical approaches to understanding generalization
Generalization bounds; empirical evaluation criteria to evaluate bounds
Robustness: generalizing to distributional shift a.k.a dataset shift
Generalization in the context of representation learning, transfer learning and deep reinforcement learning: definitions and empirical approaches

 Fri 8:30 a.m. - 8:40 a.m. Opening Remarks 🔗 Fri 8:40 a.m. - 9:10 a.m. Keynote by Dan Roy: Progress on Nonvacuous Generalization Bounds (Invited Talk) Generalization bounds are one of the main tools available for explaining the performance of learning algorithms. At the same time, most bounds in the literature are loose to an extent that raises the question as to whether these bounds actually have any explanatory power in the nonasymptotic regime of actual machine learning practice. I'll report on progress towards developing bounds and techniques---both statistical and computational---aimed at closing the gap between empirical performance and theoretical understanding. Bio: Daniel Roy is an Assistant Professor in the Department of Statistical Sciences and, by courtesy, Computer Science at the University of Toronto, and a founding faculty member of the Vector Institute for Artificial Intelligence. Daniel is a recent recipient of an Ontario Early Researcher Award and Google Faculty Research Award. Before joining U of T, Daniel held a Newton International Fellowship from the Royal Academy of Engineering and a Research Fellowship at Emmanuel College, University of Cambridge. Daniel earned his S.B., M.Eng., and Ph.D. from the Massachusetts Institute of Technology: his dissertation on probabilistic programming won an MIT EECS Sprowls Dissertation Award. Daniel's group works on foundations of machine learning and statistics. Daniel Roy 🔗 Fri 9:20 a.m. - 9:50 a.m. Keynote by Chelsea Finn: Training for Generalization (Invited Talk) TBA. Bio: Chelsea Finn is a research scientist at Google Brain, a post-doc at Berkeley AI Research Lab (BAIR), and will join the Stanford Computer Science faculty in Fall 2019. Finn’s research studies how new algorithms can enable machines to acquire intelligent behavior through learning and interaction, allowing them to perform a variety of complex sensorimotor skills in real-world settings. She has developed deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn’s research has been recognized through an NSF graduate fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the Technology Review 35 Under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. With Sergey Levine and John Schulman, she also designed and taught a course on deep reinforcement learning, with thousands of followers online. Finn received a PhD in Computer Science from UC Berkeley and a S.B. in Electrical Engineering and Computer Science from MIT. Chelsea Finn 🔗 Fri 9:50 a.m. - 10:05 a.m. A Meta-Analysis of Overfitting in Machine Learning (Spotlight) Authors: Sara Fridovich-Keil, Moritz Hardt, John Miller, Ben Recht, Rebecca Roelofs, Ludwig Schmidt and Vaishaal Shankar Abstract: We conduct the first large meta-analysis of overfitting due to test set reuse in the machine learning community. Our analysis is based on over one hundred machine learning competitions hosted on the Kaggle platform over the course of several years. In each competition, numerous practitioners repeatedly evaluated their progress against a holdout set that forms the basis of a public ranking available throughout the competition. Performance on a separate test set used only once determined the final ranking. By systematically comparing the public ranking with the final ranking, we assess how much participants adapted to the holdout set over the course of a competition. Our longitudinal study shows, somewhat surprisingly, little evidence of substantial overfitting. These findings speak to the robustness of the holdout method across different data domains, loss functions, model classes, and human analysts. 🔗 Fri 10:05 a.m. - 10:20 a.m. Uniform convergence may be unable to explain generalization in deep learning (Spotlight) Authors: Vaishnavh Nagarajan and J. Zico Kolter Abstract: We cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. While it is well-known that many existing bounds are numerically large, through a variety of experiments, we first bring to light another crucial and more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by stochastic gradient descent (SGD) where uniform convergence provably cannot explain generalization,'' even if we take into account implicit regularization {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by SGD that have test errors less than some small $\epsilon$, applying (two-sided) uniform convergence on this set of classifiers yields a generalization guarantee that is larger than $1-\epsilon$ and is therefore nearly vacuous. 🔗 Fri 10:20 a.m. - 10:40 a.m. Break and Poster Session 1- Uniform convergence may be unable to explain generalization in deep learning. Vaishnavh Nagarajan and J. Zico Kolter 2- The effects of optimization on generalization in infinitely wide neural networks. Anastasia Borovykh 3- Generalized Capsule Networks with Trainable Routing Procedure. Zhenhua Chen, Chuhua Wang, David Crandall and Tiancong Zhao 4- Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks. Gauthier Gidel, Francis Bach and Simon Lacoste-Julien 5- Stable Rank Normalization for Improved Generalization in Neural Networks. Amartya Sanyal, Philip H Torr and Puneet K Dokania 6- On improving deep learning generalization with adaptive sparse connectivity. Shiwei Liu, Decebal Constantin Mocanu and Mykola Pechenizkiy 7- Identity Connections in Residual Nets Improve Noise Stability. Shuzhi Yu and Carlo Tomasi 8- Factors for the Generalisation of Identity Relations by Neural Networks. Radha Manisha Kopparti and Tillman Weyde 9- Output-Constrained Bayesian Neural Networks. Wanqian Yang, Lars Lorch, Moritz A. Graule, Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, Melanie F. Pradier and Finale Doshi-Velez 10- An Empirical Study on Hyperparameters and their Interdependence for RL Generalization. Xingyou Song, Yilun Du and Jacob Jackson 11- Towards Large Scale Structure of the Loss Landscape of Neural Networks. Stanislav Fort and Stanislaw Jastrzebski 12- Detecting Extrapolation with Influence Functions. David Madras, James Atwood and Alex D'Amour 13- Towards Task and Architecture-Independent Generalization Gap Predictors. Scott Yak, Hanna Mazzawi and Javier Gonzalvo 14- SGD Picks a Stable Enough Trajectory. Stanisław Jastrzębski and Stanislav Fort 15- MazeNavigator: A Customisable 3D Benchmark for Assessing Generalisation in Reinforcement Learning. Luke Harries, Sebastian Lee, Jaroslaw Rzepecki, Katja Hofmann and Sam Devlin 16- Utilizing Eye Gaze to Enhance the Generalization of Imitation Network to Unseen Environments. Congcong Liu, Yuying Chen, Lei Tai, Ming Liu and Bertram Shi 17- Investigating Under and Overfitting in Wasserstein Generative Adversarial Networks. Ben Adlam, Charles Weill and Amol Kapoor 18- An Empirical Evaluation of Adversarial Robustness under Transfer Learning. Todor Davchev, Timos Korres, Stathi Fotiadis, Nick Antonopoulos and Subramanian Ramamoorthy 19- On Adversarial Robustness of Small vs Large Batch Training. Sandesh Kamath, Amit Deshpande and K V Subrahmanyam 20- The Principle of Unchanged Optimality in Reinforcement Learning Generalization. Xingyou Song and Alex Irpan 21- On the Generalization Capability of Memory Networks for Reasoning. Monireh Ebrahimi, Md Kamruzzaman Sarker, Federico Bianchi, Ning Xie, Aaron Eberhart, Derek Doran and Pascal Hitzler 22- Visualizing How Embeddings Generalize. Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou and Robert Pless 23- Theoretical Analysis of the Fixup Initialization for Fast Convergence and High Generalization Ability. Yasutaka Furusho and Kazushi Ikeda 24- Data-Dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma 25- Few-Shot Transfer Learning From Multiple Pre-Trained Networks. Joshua Ka-Wing Lee, Prasanna Sattigeri and Gregory Wornell 26- Uniform Stability and High Order Approximation of SGLD in Non-Convex Learning. Mufan Li and Maxime Gazeau 27- Better Generalization with Adaptive Adversarial Training. Amit Despande, Sandesh Kamath and K V Subrahmanyam 28- Adversarial Training Generalizes Spectral Norm Regularization. Kevin Roth, Yannic Kilcher and Thomas Hofmann 29- A Causal View on Robustness of Neural Networks. Cheng Zhang and Yingzhen Li 30- Improving PAC-Bayes bounds for neural networks using geometric properties of the training method. Anirbit Mukherjee, Dan Roy, Pushpendre Rastogi and Jun Yang 31- An Analysis of the Effect of Invariance on Generalization in Neural Networks. Clare Lyle, Marta Kwiatkowska and Yarin Gal 32- Data-Dependent Mututal Information Bounds for SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Dziugaite, Mahdi Haghifam and Ashish Khisti 33- Comparing normalization in conditional computation tasks. Vincent Michalski, Vikram Voleti, Samira Ebrahimi Kahou, Anthony Ortiz, Pascal Vincent, Chris Pal and Doina Precup 34- Weight and Batch Normalization implement Classical Generalization Bounds. Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Jack Hidary and Tomaso Poggio 35- Increasing batch size through instance repetition improves generalization. Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler and Daniel Soudry 36- Zero-Shot Learning from scratch: leveraging local compositional representations. Tristan Sylvain, Linda Petrini and Devon Hjelm 37- Circuit-Based Intrinsic Methods to Detect Overfitting. Satrajit Chatterjee and Alan Mishchenko 38- Dimension Reduction Approach for Interpretability of Sequence to SequenceRecurrent Neural Networks. Kun Su and Eli Shlizerman 39- Tight PAC-Bayesian generalization error bounds for deep learning. Guillermo Valle Perez, Chico Q. Camargo and Ard A. Louis 40- How Learning Rate and Delay Affect Minima Selection in AsynchronousTraining of Neural Networks: Toward Closing the Generalization Gap. Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Daniel Soudry 41- Making Convolutional Networks Shift-Invariant Again. Richard Zhang 42- A Meta-Analysis of Overfitting in Machine Learning. Sara Fridovich-Keil, Moritz Hardt, John Miller, Ben Recht, Rebecca Roelofs, Ludwig Schmidt and Vaishaal Shankar 43- Kernelized Capsule Networks. Taylor Killian, Justin Goodwin, Olivia Brown and Sung Son 44- Model similarity mitigates test set overuse. Moritz Hardt, Horia Mania, John Miller, Ben Recht and Ludwig Schmidt 45- Understanding Generalization of Deep Neural Networks Trained with Noisy Labels. Wei Hu, Zhiyuan Li and Dingli Yu 46- Domainwise Classification Network for Unsupervised Domain Adaptation. Seonguk Seo, Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun You, Woong-Gi Chang and Suha Kwak 47- The Generalization-Stability Tradeoff in Neural Network Pruning. Brian Bartoldson, Ari Morcos, Adrian Barbu and Gordon Erlebacher 48- Information matrices and generalization. Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Manzagol, Yoshua Bengio and Nicolas Le Roux 49- Adaptively Preconditioned Stochastic Gradient Langevin Dynamics. Chandrasekaran Anirudh Bhardwaj 50- Additive or Concatenating Skip-connections Improve Data Separability. Yasutaka Furusho and Kazushi Ikeda 51- On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti and Julien Mairal 52- PAC Bayes Bound Minimization via Kronecker Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville 53- SGD on Neural Networks Learns Functions of Increasing Complexity. Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Ben Edelman, Fred Zhang and Boaz Barak 54- Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks. Samet Oymak, Mingchen Li, Zalan Fabian and Mahdi Soltanolkotabi 55- Incorrect gradients and regularization: a perspective of loss landscapes. Mehrdad Yazdani 56- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 57- SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 58- Natural Adversarial Examples. Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt and Dawn Song 59- On the Properties of the Objective Landscapes and Generalization of Gradient-Based Meta-Learning. Simon Guiroy, Vikas Verma and Christopher Pal 60- Angular Visual Hardness. Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali Shrivastava and Animashree Anandkumar 61- Luck Matters: Understanding Training Dynamics of Deep ReLU Networks. Yuandong Tian, Tina Jiang, Qucheng Gong and Ari Morcos 62- Understanding of Generalization in Deep Learning via Tensor Methods. Jingling Li, Yanchao Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 63- Learning from Rules Performs as Implicit Regularization. Hossein Hosseini, Ramin Moslemi, Ali Hooshmand and Ratnesh Sharma 64- Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization. Navid Azizan, Sahin Lale and Babak Hassibi 65- Scaling Characteristics of Sequential Multitask Learning: Networks Naturally Learn to Learn. Guy Davidson and Michael Mozer 66- Size-free generalization bounds for convolutional neural networks. Phillip Long and Hanie Sedghi 🔗 Fri 10:40 a.m. - 11:10 a.m. Keynote by Sham Kakade: Prediction, Learning, and Memory (Invited Talk) Building accurate language models that capture meaningful long-term dependencies is a core challenge in language processing. We consider the problem of predicting the next observation given a sequence of past observations, specifically focusing on the question of how to make accurate predictions that explicitly leverage long-range dependencies. Empirically, and perhaps surprisingly, we show that state-of-the-art language models, including LSTMs and Transformers, do not capture even basic properties of natural language: the entropy rates of their generations drift dramatically upward over time. We also provide provable methods to mitigate this phenomenon: specifically, we provide a calibration-based approach to improve an estimated model based on any measurable long-term mismatch between the estimated model and the true underlying generative distribution. More generally, we will also present fundamental information theoretic and computational limits of sequential prediction with a memory. Bio: Sham Kakade is a Washington Research Foundation Data Science Chair, with a joint appointment in the Department of Computer Science and the Department of Statistics at the University of Washington. He works on the theoretical foundations of machine learning, focusing on designing provable and practical statistically and computationally efficient algorithms. Amongst his contributions, with a diverse set of collaborators, are: establishing principled approaches in reinforcement learning (including the natural policy gradient, conservative policy iteration, and the PAC-MDP framework); optimal algorithms in the stochastic and non-stochastic multi-armed bandit problems (including the widely used linear bandit and the Gaussian process bandit models); computationally and statistically efficient tensor decomposition methods for estimation of latent variable models (including estimation of mixture of Gaussians, latent Dirichlet allocation, hidden Markov models, and overlapping communities in social networks); faster algorithms for large scale convex and nonconvex optimization (including how to escape from saddle points efficiently). He is the recipient of the IBM Goldberg best paper award (in 2007) for contributions to fast nearest neighbor search and the best paper, INFORMS Revenue Management and Pricing Section Prize (2014). He has been program chair for COLT 2011. Sham completed his Ph.D. at the Gatsby Computational Neuroscience Unit at University College London, under the supervision of Peter Dayan, and he was a postdoc at the Dept. of Computer Science, University of Pennsylvania , under the supervision of Michael Kearns. Sham was an undergraduate at Caltech , studying in physics under the supervision of John Preskill. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago. Sham Kakade 🔗 Fri 11:10 a.m. - 11:40 a.m. Keynote by Mikhail Belkin: A Hard Look at Generalization and its Theories (Invited Talk) "A model with zero training error is overfit to the training data and will typically generalize poorly" goes statistical textbook wisdom. Yet in modern practice over-parametrized deep networks with near perfect (interpolating) fit on training data still show excellent test performance. This fact is difficult to reconcile with most modern theories of generalization that rely on bounding the difference between the empirical and expected error. Indeed, as we will discuss, bounds of that type cannot be expected to explain generalization of interpolating models. I will proceed to show how classical and modern models can be unified within a new "double descent" risk curve that extends the usual U-shaped bias-variance trade-off curve beyond the point of interpolation. This curve delimits the regime of applicability of classical bounds and the regime where new analyses are required. I will give examples of first theoretical analyses in that modern regime and discuss the (considerable) gaps in our knowledge. Finally I will briefly discuss some implications for optimization. Bio: Mikhail Belkin is a Professor in the departments of Computer Science and Engineering and Statistics at the Ohio State University. He received a PhD in mathematics from the University of Chicago in 2003. His research focuses on understanding the fundamental structure in data, the principles of recovering these structures and their computational, mathematical and statistical properties. This understanding, in turn, leads to algorithms for dealing with real-world data. His work includes algorithms such as Laplacian Eigenmaps and Manifold Regularization based on ideas of classical differential geometry, which have been widely used for analyzing non-linear high-dimensional data. He has done work on spectral methods, Gaussian mixture models, kernel methods and applications. Recently his work has been focussed on understanding generalization and optimization in modern over-parametrized machine learning. Prof. Belkin is a recipient of an NSF Career Award and a number of best paper and other awards and has served on the editorial boards of the Journal of Machine Learning Research and IEEE PAMI. Mikhail Belkin 🔗 Fri 11:40 a.m. - 11:55 a.m. Towards Task and Architecture-Independent Generalization Gap Predictors (Spotlight) Authors: Scott Yak, Hanna Mazzawi and Javier Gonzalvo Abstract: Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural networks. We extend Jiang et al.(2018) to also use DNNs and RNNs and show that they outperform the linear model, obtaining R^2=0.965. We also show results for architecture-independent, task-independent, and out-of-distribution generalization gap prediction tasks. Both DNNs and RNNs consistently and significantly outperform linear models, with RNNs obtaining R^2=0.584. 🔗 Fri 11:55 a.m. - 12:10 p.m. Data-Dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation (Spotlight) Authors: Colin Wei and Tengyu Ma Abstract: Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to the previous layers. Our bounds scale polynomially in depth when these empirical quantities are small, as is usually the case in practice. To obtain these bounds, we develop general tools for augmenting a sequence of functions to make their composition Lipschitz and then covering the augmented functions. Inspired by our theory, we directly regularize the network's Jacobians during training and empirically demonstrate that this improves test performance. 🔗 Fri 12:10 p.m. - 1:30 p.m. Lunch and Poster Session 1- Uniform convergence may be unable to explain generalization in deep learning. Vaishnavh Nagarajan and J. Zico Kolter 2- The effects of optimization on generalization in infinitely wide neural networks. Anastasia Borovykh 3- Generalized Capsule Networks with Trainable Routing Procedure. Zhenhua Chen, Chuhua Wang, David Crandall and Tiancong Zhao 4- Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks. Gauthier Gidel, Francis Bach and Simon Lacoste-Julien 5- Stable Rank Normalization for Improved Generalization in Neural Networks. Amartya Sanyal, Philip H Torr and Puneet K Dokania 6- On improving deep learning generalization with adaptive sparse connectivity. Shiwei Liu, Decebal Constantin Mocanu and Mykola Pechenizkiy 7- Identity Connections in Residual Nets Improve Noise Stability. Shuzhi Yu and Carlo Tomasi 8- Factors for the Generalisation of Identity Relations by Neural Networks. Radha Manisha Kopparti and Tillman Weyde 9- Output-Constrained Bayesian Neural Networks. Wanqian Yang, Lars Lorch, Moritz A. Graule, Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, Melanie F. Pradier and Finale Doshi-Velez 10- An Empirical Study on Hyperparameters and their Interdependence for RL Generalization. Xingyou Song, Yilun Du and Jacob Jackson 11- Towards Large Scale Structure of the Loss Landscape of Neural Networks. Stanislav Fort and Stanislaw Jastrzebski 12- Detecting Extrapolation with Influence Functions. David Madras, James Atwood and Alex D'Amour 13- Towards Task and Architecture-Independent Generalization Gap Predictors. Scott Yak, Hanna Mazzawi and Javier Gonzalvo 14- SGD Picks a Stable Enough Trajectory. Stanisław Jastrzębski and Stanislav Fort 15- MazeNavigator: A Customisable 3D Benchmark for Assessing Generalisation in Reinforcement Learning. Luke Harries, Sebastian Lee, Jaroslaw Rzepecki, Katja Hofmann and Sam Devlin 16- Utilizing Eye Gaze to Enhance the Generalization of Imitation Network to Unseen Environments. Congcong Liu, Yuying Chen, Lei Tai, Ming Liu and Bertram Shi 17- Investigating Under and Overfitting in Wasserstein Generative Adversarial Networks. Ben Adlam, Charles Weill and Amol Kapoor 18- An Empirical Evaluation of Adversarial Robustness under Transfer Learning. Todor Davchev, Timos Korres, Stathi Fotiadis, Nick Antonopoulos and Subramanian Ramamoorthy 19- On Adversarial Robustness of Small vs Large Batch Training. Sandesh Kamath, Amit Deshpande and K V Subrahmanyam 20- The Principle of Unchanged Optimality in Reinforcement Learning Generalization. Xingyou Song and Alex Irpan 21- On the Generalization Capability of Memory Networks for Reasoning. Monireh Ebrahimi, Md Kamruzzaman Sarker, Federico Bianchi, Ning Xie, Aaron Eberhart, Derek Doran and Pascal Hitzler 22- Visualizing How Embeddings Generalize. Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou and Robert Pless 23- Theoretical Analysis of the Fixup Initialization for Fast Convergence and High Generalization Ability. Yasutaka Furusho and Kazushi Ikeda 24- Data-Dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma 25- Few-Shot Transfer Learning From Multiple Pre-Trained Networks. Joshua Ka-Wing Lee, Prasanna Sattigeri and Gregory Wornell 26- Uniform Stability and High Order Approximation of SGLD in Non-Convex Learning. Mufan Li and Maxime Gazeau 27- Better Generalization with Adaptive Adversarial Training. Amit Despande, Sandesh Kamath and K V Subrahmanyam 28- Adversarial Training Generalizes Spectral Norm Regularization. Kevin Roth, Yannic Kilcher and Thomas Hofmann 29- A Causal View on Robustness of Neural Networks. Cheng Zhang and Yingzhen Li 30- Improving PAC-Bayes bounds for neural networks using geometric properties of the training method. Anirbit Mukherjee, Dan Roy, Pushpendre Rastogi and Jun Yang 31- An Analysis of the Effect of Invariance on Generalization in Neural Networks. Clare Lyle, Marta Kwiatkowska and Yarin Gal 32- Data-Dependent Mututal Information Bounds for SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Dziugaite, Mahdi Haghifam and Ashish Khisti 33- Comparing normalization in conditional computation tasks. Vincent Michalski, Vikram Voleti, Samira Ebrahimi Kahou, Anthony Ortiz, Pascal Vincent, Chris Pal and Doina Precup 34- Weight and Batch Normalization implement Classical Generalization Bounds. Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Jack Hidary and Tomaso Poggio 35- Increasing batch size through instance repetition improves generalization. Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler and Daniel Soudry 36- Zero-Shot Learning from scratch: leveraging local compositional representations. Tristan Sylvain, Linda Petrini and Devon Hjelm 37- Circuit-Based Intrinsic Methods to Detect Overfitting. Satrajit Chatterjee and Alan Mishchenko 38- Dimension Reduction Approach for Interpretability of Sequence to SequenceRecurrent Neural Networks. Kun Su and Eli Shlizerman 39- Tight PAC-Bayesian generalization error bounds for deep learning. Guillermo Valle Perez, Chico Q. Camargo and Ard A. Louis 40- How Learning Rate and Delay Affect Minima Selection in AsynchronousTraining of Neural Networks: Toward Closing the Generalization Gap. Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Daniel Soudry 41- Making Convolutional Networks Shift-Invariant Again. Richard Zhang 42- A Meta-Analysis of Overfitting in Machine Learning. Sara Fridovich-Keil, Moritz Hardt, John Miller, Ben Recht, Rebecca Roelofs, Ludwig Schmidt and Vaishaal Shankar 43- Kernelized Capsule Networks. Taylor Killian, Justin Goodwin, Olivia Brown and Sung Son 44- Model similarity mitigates test set overuse. Moritz Hardt, Horia Mania, John Miller, Ben Recht and Ludwig Schmidt 45- Understanding Generalization of Deep Neural Networks Trained with Noisy Labels. Wei Hu, Zhiyuan Li and Dingli Yu 46- Domainwise Classification Network for Unsupervised Domain Adaptation. Seonguk Seo, Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun You, Woong-Gi Chang and Suha Kwak 47- The Generalization-Stability Tradeoff in Neural Network Pruning. Brian Bartoldson, Ari Morcos, Adrian Barbu and Gordon Erlebacher 48- Information matrices and generalization. Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Manzagol, Yoshua Bengio and Nicolas Le Roux 49- Adaptively Preconditioned Stochastic Gradient Langevin Dynamics. Chandrasekaran Anirudh Bhardwaj 50- Additive or Concatenating Skip-connections Improve Data Separability. Yasutaka Furusho and Kazushi Ikeda 51- On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti and Julien Mairal 52- PAC Bayes Bound Minimization via Kronecker Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville 53- SGD on Neural Networks Learns Functions of Increasing Complexity. Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Ben Edelman, Fred Zhang and Boaz Barak 54- Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks. Samet Oymak, Mingchen Li, Zalan Fabian and Mahdi Soltanolkotabi 55- Incorrect gradients and regularization: a perspective of loss landscapes. Mehrdad Yazdani 56- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 57- SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 58- Natural Adversarial Examples. Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt and Dawn Song 59- On the Properties of the Objective Landscapes and Generalization of Gradient-Based Meta-Learning. Simon Guiroy, Vikas Verma and Christopher Pal 60- Angular Visual Hardness. Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali Shrivastava and Animashree Anandkumar 61- Luck Matters: Understanding Training Dynamics of Deep ReLU Networks. Yuandong Tian, Tina Jiang, Qucheng Gong and Ari Morcos 62- Understanding of Generalization in Deep Learning via Tensor Methods. Jingling Li, Yanchao Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 63- Learning from Rules Performs as Implicit Regularization. Hossein Hosseini, Ramin Moslemi, Ali Hooshmand and Ratnesh Sharma 64- Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization. Navid Azizan, Sahin Lale and Babak Hassibi 65- Scaling Characteristics of Sequential Multitask Learning: Networks Naturally Learn to Learn. Guy Davidson and Michael Mozer 66- Size-free generalization bounds for convolutional neural networks. Phillip Long and Hanie Sedghi 🔗 Fri 1:30 p.m. - 2:00 p.m. Keynote by Aleksander Mądry: Are All Features Created Equal? (Invited Talk) TBA. Bio: Aleksander Mądry is an Associate Professor of Computer Science in the MIT EECS Department and a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 2011 and, prior to joining the MIT faculty, he spent some time at Microsoft Research New England and on the faculty of EPFL. Aleksander’s research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. Aleksander Madry 🔗 Fri 2:00 p.m. - 2:30 p.m. Keynote by Jason Lee: On the Foundations of Deep Learning: SGD, Overparametrization, and Generalization (Invited Talk) Deep Learning has had phenomenal empirical successes in many domains including computer vision, natural language processing, and speech recognition. To consolidate and boost the empirical success, we need to develop a more systematic and deeper understanding of the elusive principles of deep learning. In this talk, I will provide analysis of several elements of deep learning including non-convex optimization, overparametrization, and generalization error. First, we show that gradient descent and many other algorithms are guaranteed to converge to a local minimizer of the loss. For several interesting problems including the matrix completion problem, this guarantees that we converge to a global minimum. Then we will show that gradient descent converges to a global minimizer for deep overparametrized networks. Finally, we analyze the generalization error by showing that a subtle combination of SGD, logistic loss, and architecture combine to promote large margin classifiers, which are guaranteed to have low generalization error. Together, these results show that on overparametrized deep networks SGD finds solution of both low train and test error. Bio: Jason Lee is an assistant professor in Data Sciences and Operations at the University of Southern California. Prior to that, he was a postdoctoral researcher at UC Berkeley working with Michael Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in statistics, machine learning, and optimization. Lately, he has worked on high dimensional statistical inference, analysis of non-convex optimization algorithms, and theory for deep learning. Jason Lee 🔗 Fri 2:30 p.m. - 2:45 p.m. Towards Large Scale Structure of the Loss Landscape of Neural Networks (Spotlight) Authors: Stanislav Fort and Stanislaw Jastrzebski Abstract: There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. Our core idea is to model the loss landscape as a set of high dimensional \emph{sheets} that together form a distributed, large-scale, inter-connected structure. For instance, we predict an existence of low loss subspaces connecting a set (not only a pair) of solutions, and verify it experimentally. We conclude by showing that hyperparameter choices such as learning rate, batch size, dropout and $L_2$ regularization, affect the path optimizer takes through the landscape in a similar way. 🔗 Fri 2:45 p.m. - 3:00 p.m. Zero-Shot Learning from scratch: leveraging local compositional representations (Spotlight) Authors: Tristan Sylvain, Linda Petrini and Devon Hjelm Abstract: Zero-shot classification is a task focused on generalization where no instance from the target classes is seen during training. To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or text descriptions. While classical zero-shot learning does not specify how this problem should be solved, the most successful approaches rely on features extracted from encoders pre-trained on large datasets, most commonly Imagenet. This approach raises important questions that might otherwise distract researchers from answering fundamental questions about representation learning and generalization. For instance, one should wonder to what extent these methods actually learn representations robust with respect to the task, rather than simply exploiting information stored in the encoder. To remove these distractors, we propose a more challenging setting: Zero-Shot Learning from scratch, which effectively forbids the use encoders fine-tuned on other datasets. Our analysis on this setting highlights the importance of local information, and compositional representations. 🔗 Fri 3:00 p.m. - 3:30 p.m. Break and Poster Session 1- Uniform convergence may be unable to explain generalization in deep learning. Vaishnavh Nagarajan and J. Zico Kolter 2- The effects of optimization on generalization in infinitely wide neural networks. Anastasia Borovykh 3- Generalized Capsule Networks with Trainable Routing Procedure. Zhenhua Chen, Chuhua Wang, David Crandall and Tiancong Zhao 4- Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks. Gauthier Gidel, Francis Bach and Simon Lacoste-Julien 5- Stable Rank Normalization for Improved Generalization in Neural Networks. Amartya Sanyal, Philip H Torr and Puneet K Dokania 6- On improving deep learning generalization with adaptive sparse connectivity. Shiwei Liu, Decebal Constantin Mocanu and Mykola Pechenizkiy 7- Identity Connections in Residual Nets Improve Noise Stability. Shuzhi Yu and Carlo Tomasi 8- Factors for the Generalisation of Identity Relations by Neural Networks. Radha Manisha Kopparti and Tillman Weyde 9- Output-Constrained Bayesian Neural Networks. Wanqian Yang, Lars Lorch, Moritz A. Graule, Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, Melanie F. Pradier and Finale Doshi-Velez 10- An Empirical Study on Hyperparameters and their Interdependence for RL Generalization. Xingyou Song, Yilun Du and Jacob Jackson 11- Towards Large Scale Structure of the Loss Landscape of Neural Networks. Stanislav Fort and Stanislaw Jastrzebski 12- Detecting Extrapolation with Influence Functions. David Madras, James Atwood and Alex D'Amour 13- Towards Task and Architecture-Independent Generalization Gap Predictors. Scott Yak, Hanna Mazzawi and Javier Gonzalvo 14- SGD Picks a Stable Enough Trajectory. Stanisław Jastrzębski and Stanislav Fort 15- MazeNavigator: A Customisable 3D Benchmark for Assessing Generalisation in Reinforcement Learning. Luke Harries, Sebastian Lee, Jaroslaw Rzepecki, Katja Hofmann and Sam Devlin 16- Utilizing Eye Gaze to Enhance the Generalization of Imitation Network to Unseen Environments. Congcong Liu, Yuying Chen, Lei Tai, Ming Liu and Bertram Shi 17- Investigating Under and Overfitting in Wasserstein Generative Adversarial Networks. Ben Adlam, Charles Weill and Amol Kapoor 18- An Empirical Evaluation of Adversarial Robustness under Transfer Learning. Todor Davchev, Timos Korres, Stathi Fotiadis, Nick Antonopoulos and Subramanian Ramamoorthy 19- On Adversarial Robustness of Small vs Large Batch Training. Sandesh Kamath, Amit Deshpande and K V Subrahmanyam 20- The Principle of Unchanged Optimality in Reinforcement Learning Generalization. Xingyou Song and Alex Irpan 21- On the Generalization Capability of Memory Networks for Reasoning. Monireh Ebrahimi, Md Kamruzzaman Sarker, Federico Bianchi, Ning Xie, Aaron Eberhart, Derek Doran and Pascal Hitzler 22- Visualizing How Embeddings Generalize. Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou and Robert Pless 23- Theoretical Analysis of the Fixup Initialization for Fast Convergence and High Generalization Ability. Yasutaka Furusho and Kazushi Ikeda 24- Data-Dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma 25- Few-Shot Transfer Learning From Multiple Pre-Trained Networks. Joshua Ka-Wing Lee, Prasanna Sattigeri and Gregory Wornell 26- Uniform Stability and High Order Approximation of SGLD in Non-Convex Learning. Mufan Li and Maxime Gazeau 27- Better Generalization with Adaptive Adversarial Training. Amit Despande, Sandesh Kamath and K V Subrahmanyam 28- Adversarial Training Generalizes Spectral Norm Regularization. Kevin Roth, Yannic Kilcher and Thomas Hofmann 29- A Causal View on Robustness of Neural Networks. Cheng Zhang and Yingzhen Li 30- Improving PAC-Bayes bounds for neural networks using geometric properties of the training method. Anirbit Mukherjee, Dan Roy, Pushpendre Rastogi and Jun Yang 31- An Analysis of the Effect of Invariance on Generalization in Neural Networks. Clare Lyle, Marta Kwiatkowska and Yarin Gal 32- Data-Dependent Mututal Information Bounds for SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Dziugaite, Mahdi Haghifam and Ashish Khisti 33- Comparing normalization in conditional computation tasks. Vincent Michalski, Vikram Voleti, Samira Ebrahimi Kahou, Anthony Ortiz, Pascal Vincent, Chris Pal and Doina Precup 34- Weight and Batch Normalization implement Classical Generalization Bounds. Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Jack Hidary and Tomaso Poggio 35- Increasing batch size through instance repetition improves generalization. Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler and Daniel Soudry 36- Zero-Shot Learning from scratch: leveraging local compositional representations. Tristan Sylvain, Linda Petrini and Devon Hjelm 37- Circuit-Based Intrinsic Methods to Detect Overfitting. Satrajit Chatterjee and Alan Mishchenko 38- Dimension Reduction Approach for Interpretability of Sequence to SequenceRecurrent Neural Networks. Kun Su and Eli Shlizerman 39- Tight PAC-Bayesian generalization error bounds for deep learning. Guillermo Valle Perez, Chico Q. Camargo and Ard A. Louis 40- How Learning Rate and Delay Affect Minima Selection in AsynchronousTraining of Neural Networks: Toward Closing the Generalization Gap. Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Daniel Soudry 41- Making Convolutional Networks Shift-Invariant Again. Richard Zhang 42- A Meta-Analysis of Overfitting in Machine Learning. Sara Fridovich-Keil, Moritz Hardt, John Miller, Ben Recht, Rebecca Roelofs, Ludwig Schmidt and Vaishaal Shankar 43- Kernelized Capsule Networks. Taylor Killian, Justin Goodwin, Olivia Brown and Sung Son 44- Model similarity mitigates test set overuse. Moritz Hardt, Horia Mania, John Miller, Ben Recht and Ludwig Schmidt 45- Understanding Generalization of Deep Neural Networks Trained with Noisy Labels. Wei Hu, Zhiyuan Li and Dingli Yu 46- Domainwise Classification Network for Unsupervised Domain Adaptation. Seonguk Seo, Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun You, Woong-Gi Chang and Suha Kwak 47- The Generalization-Stability Tradeoff in Neural Network Pruning. Brian Bartoldson, Ari Morcos, Adrian Barbu and Gordon Erlebacher 48- Information matrices and generalization. Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Manzagol, Yoshua Bengio and Nicolas Le Roux 49- Adaptively Preconditioned Stochastic Gradient Langevin Dynamics. Chandrasekaran Anirudh Bhardwaj 50- Additive or Concatenating Skip-connections Improve Data Separability. Yasutaka Furusho and Kazushi Ikeda 51- On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti and Julien Mairal 52- PAC Bayes Bound Minimization via Kronecker Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville 53- SGD on Neural Networks Learns Functions of Increasing Complexity. Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Ben Edelman, Fred Zhang and Boaz Barak 54- Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks. Samet Oymak, Mingchen Li, Zalan Fabian and Mahdi Soltanolkotabi 55- Incorrect gradients and regularization: a perspective of loss landscapes. Mehrdad Yazdani 56- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 57- SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 58- Natural Adversarial Examples. Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt and Dawn Song 59- On the Properties of the Objective Landscapes and Generalization of Gradient-Based Meta-Learning. Simon Guiroy, Vikas Verma and Christopher Pal 60- Angular Visual Hardness. Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali Shrivastava and Animashree Anandkumar 61- Luck Matters: Understanding Training Dynamics of Deep ReLU Networks. Yuandong Tian, Tina Jiang, Qucheng Gong and Ari Morcos 62- Understanding of Generalization in Deep Learning via Tensor Methods. Jingling Li, Yanchao Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 63- Learning from Rules Performs as Implicit Regularization. Hossein Hosseini, Ramin Moslemi, Ali Hooshmand and Ratnesh Sharma 64- Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization. Navid Azizan, Sahin Lale and Babak Hassibi 65- Scaling Characteristics of Sequential Multitask Learning: Networks Naturally Learn to Learn. Guy Davidson and Michael Mozer 66- Size-free generalization bounds for convolutional neural networks. Phillip Long and Hanie Sedghi 🔗 Fri 3:30 p.m. - 4:30 p.m. Panel Discussion (Nati Srebro, Dan Roy, Chelsea Finn, Mikhail Belkin, Aleksander Mądry, Jason Lee) (Panel Discussion) Nati Srebro · Daniel Roy · Chelsea Finn · Mikhail Belkin · Aleksander Madry · Jason Lee 🔗 Fri 4:30 p.m. - 4:45 p.m. Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks (Spotlight) Authors: Samet Oymak, Mingchen Li, Zalan Fabian and Mahdi Soltanolkotabi Abstract: Many modern neural network architectures contain many more parameters than the size of the training data. Such networks can easily overfit to training data, hence it is crucial to understand the fundamental principles that facilitate good test accuracy. This paper explores the generalization capabilities of neural networks trained via gradient descent. We show that the Jacobian matrix associated with the network dictates the directions where learning is generalizable and fast versus directions where overfitting occurs and learning is slow. We develop a bias-variance theory which provides a control knob to split the Jacobian spectum into information" andnuisance" spaces associated with the large and small singular values of the Jacobian. We show that (i) over the information space learning is fast and we can quickly train a model with zero training loss that can also generalize well, (ii) over the nuisance subspace overfitting might result in higher variance hence early stopping can help with generalization at the expense of some bias. We conduct numerical experiments on deep networks that corroborate out theory and demonstrate that: (i) the Jacobian of typical networks exhibit a bimodal structure with a few large singular values and many small ones leading to a low-dimensional information space (ii) most of the useful information lies on the information space where learning happens quickly. 🔗 Fri 4:45 p.m. - 5:00 p.m. How Learning Rate and Delay Affect Minima Selection in AsynchronousTraining of Neural Networks: Toward Closing the Generalization Gap (Spotlight) Authors: Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Daniel Soudry Abstract: Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more scalable. However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm. This gap remains not well understood, as theoretical analysis so far mainly focused on the convergence rate of asynchronous methods. Contributions: We examine asynchronous training from the perspective of dynamical stability. We find that the degree of delay interacts with the learning rate, to change the set of minima accessible by an asynchronous stochastic gradient descent algorithm. We derive closed-form rules on how the hyperparameters could be changed while keeping the accessible set the same. Specifically, for high delay values, we find that the learning rate should be decreased inversely with the delay, and discuss the effect of momentum. We provide empirical experiments to validate our theoretical findings 🔗 Fri 5:00 p.m. - 6:00 p.m. Poster Session 1- Uniform convergence may be unable to explain generalization in deep learning. Vaishnavh Nagarajan and J. Zico Kolter 2- The effects of optimization on generalization in infinitely wide neural networks. Anastasia Borovykh 3- Generalized Capsule Networks with Trainable Routing Procedure. Zhenhua Chen, Chuhua Wang, David Crandall and Tiancong Zhao 4- Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks. Gauthier Gidel, Francis Bach and Simon Lacoste-Julien 5- Stable Rank Normalization for Improved Generalization in Neural Networks. Amartya Sanyal, Philip H Torr and Puneet K Dokania 6- On improving deep learning generalization with adaptive sparse connectivity. Shiwei Liu, Decebal Constantin Mocanu and Mykola Pechenizkiy 7- Identity Connections in Residual Nets Improve Noise Stability. Shuzhi Yu and Carlo Tomasi 8- Factors for the Generalisation of Identity Relations by Neural Networks. Radha Manisha Kopparti and Tillman Weyde 9- Output-Constrained Bayesian Neural Networks. Wanqian Yang, Lars Lorch, Moritz A. Graule, Srivatsan Srinivasan, Anirudh Suresh, Jiayu Yao, Melanie F. Pradier and Finale Doshi-Velez 10- An Empirical Study on Hyperparameters and their Interdependence for RL Generalization. Xingyou Song, Yilun Du and Jacob Jackson 11- Towards Large Scale Structure of the Loss Landscape of Neural Networks. Stanislav Fort and Stanislaw Jastrzebski 12- Detecting Extrapolation with Influence Functions. David Madras, James Atwood and Alex D'Amour 13- Towards Task and Architecture-Independent Generalization Gap Predictors. Scott Yak, Hanna Mazzawi and Javier Gonzalvo 14- SGD Picks a Stable Enough Trajectory. Stanisław Jastrzębski and Stanislav Fort 15- MazeNavigator: A Customisable 3D Benchmark for Assessing Generalisation in Reinforcement Learning. Luke Harries, Sebastian Lee, Jaroslaw Rzepecki, Katja Hofmann and Sam Devlin 16- Utilizing Eye Gaze to Enhance the Generalization of Imitation Network to Unseen Environments. Congcong Liu, Yuying Chen, Lei Tai, Ming Liu and Bertram Shi 17- Investigating Under and Overfitting in Wasserstein Generative Adversarial Networks. Ben Adlam, Charles Weill and Amol Kapoor 18- An Empirical Evaluation of Adversarial Robustness under Transfer Learning. Todor Davchev, Timos Korres, Stathi Fotiadis, Nick Antonopoulos and Subramanian Ramamoorthy 19- On Adversarial Robustness of Small vs Large Batch Training. Sandesh Kamath, Amit Deshpande and K V Subrahmanyam 20- The Principle of Unchanged Optimality in Reinforcement Learning Generalization. Xingyou Song and Alex Irpan 21- On the Generalization Capability of Memory Networks for Reasoning. Monireh Ebrahimi, Md Kamruzzaman Sarker, Federico Bianchi, Ning Xie, Aaron Eberhart, Derek Doran and Pascal Hitzler 22- Visualizing How Embeddings Generalize. Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou and Robert Pless 23- Theoretical Analysis of the Fixup Initialization for Fast Convergence and High Generalization Ability. Yasutaka Furusho and Kazushi Ikeda 24- Data-Dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma 25- Few-Shot Transfer Learning From Multiple Pre-Trained Networks. Joshua Ka-Wing Lee, Prasanna Sattigeri and Gregory Wornell 26- Uniform Stability and High Order Approximation of SGLD in Non-Convex Learning. Mufan Li and Maxime Gazeau 27- Better Generalization with Adaptive Adversarial Training. Amit Despande, Sandesh Kamath and K V Subrahmanyam 28- Adversarial Training Generalizes Spectral Norm Regularization. Kevin Roth, Yannic Kilcher and Thomas Hofmann 29- A Causal View on Robustness of Neural Networks. Cheng Zhang and Yingzhen Li 30- Improving PAC-Bayes bounds for neural networks using geometric properties of the training method. Anirbit Mukherjee, Dan Roy, Pushpendre Rastogi and Jun Yang 31- An Analysis of the Effect of Invariance on Generalization in Neural Networks. Clare Lyle, Marta Kwiatkowska and Yarin Gal 32- Data-Dependent Mututal Information Bounds for SGLD. Jeffrey Negrea, Daniel Roy, Gintare Karolina Dziugaite, Mahdi Haghifam and Ashish Khisti 33- Comparing normalization in conditional computation tasks. Vincent Michalski, Vikram Voleti, Samira Ebrahimi Kahou, Anthony Ortiz, Pascal Vincent, Chris Pal and Doina Precup 34- Weight and Batch Normalization implement Classical Generalization Bounds. Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Jack Hidary and Tomaso Poggio 35- Increasing batch size through instance repetition improves generalization. Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler and Daniel Soudry 36- Zero-Shot Learning from scratch: leveraging local compositional representations. Tristan Sylvain, Linda Petrini and Devon Hjelm 37- Circuit-Based Intrinsic Methods to Detect Overfitting. Satrajit Chatterjee and Alan Mishchenko 38- Dimension Reduction Approach for Interpretability of Sequence to SequenceRecurrent Neural Networks. Kun Su and Eli Shlizerman 39- Tight PAC-Bayesian generalization error bounds for deep learning. Guillermo Valle Perez, Chico Q. Camargo and Ard A. Louis 40- How Learning Rate and Delay Affect Minima Selection in AsynchronousTraining of Neural Networks: Toward Closing the Generalization Gap. Niv Giladi, Mor Shpigel Nacson, Elad Hoffer and Daniel Soudry 41- Making Convolutional Networks Shift-Invariant Again. Richard Zhang 42- A Meta-Analysis of Overfitting in Machine Learning. Sara Fridovich-Keil, Moritz Hardt, John Miller, Ben Recht, Rebecca Roelofs, Ludwig Schmidt and Vaishaal Shankar 43- Kernelized Capsule Networks. Taylor Killian, Justin Goodwin, Olivia Brown and Sung Son 44- Model similarity mitigates test set overuse. Moritz Hardt, Horia Mania, John Miller, Ben Recht and Ludwig Schmidt 45- Understanding Generalization of Deep Neural Networks Trained with Noisy Labels. Wei Hu, Zhiyuan Li and Dingli Yu 46- Domainwise Classification Network for Unsupervised Domain Adaptation. Seonguk Seo, Yumin Suh, Bohyung Han, Taeho Lee, Tackgeun You, Woong-Gi Chang and Suha Kwak 47- The Generalization-Stability Tradeoff in Neural Network Pruning. Brian Bartoldson, Ari Morcos, Adrian Barbu and Gordon Erlebacher 48- Information matrices and generalization. Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Manzagol, Yoshua Bengio and Nicolas Le Roux 49- Adaptively Preconditioned Stochastic Gradient Langevin Dynamics. Chandrasekaran Anirudh Bhardwaj 50- Additive or Concatenating Skip-connections Improve Data Separability. Yasutaka Furusho and Kazushi Ikeda 51- On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti and Julien Mairal 52- PAC Bayes Bound Minimization via Kronecker Normalizing Flows. Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville 53- SGD on Neural Networks Learns Functions of Increasing Complexity. Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Ben Edelman, Fred Zhang and Boaz Barak 54- Overparameterization without Overfitting: Jacobian-based Generalization Guarantees for Neural Networks. Samet Oymak, Mingchen Li, Zalan Fabian and Mahdi Soltanolkotabi 55- Incorrect gradients and regularization: a perspective of loss landscapes. Mehrdad Yazdani 56- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 57- SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training. Ahmed Youssef, Prannoy Pilligundla and Hadi Esmaeilzadeh 58- Natural Adversarial Examples. Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt and Dawn Song 59- On the Properties of the Objective Landscapes and Generalization of Gradient-Based Meta-Learning. Simon Guiroy, Vikas Verma and Christopher Pal 60- Angular Visual Hardness. Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali Shrivastava and Animashree Anandkumar 61- Luck Matters: Understanding Training Dynamics of Deep ReLU Networks. Yuandong Tian, Tina Jiang, Qucheng Gong and Ari Morcos 62- Understanding of Generalization in Deep Learning via Tensor Methods. Jingling Li, Yanchao Sun, Ziyin Liu, Taiji Suzuki and Furong Huang 63- Learning from Rules Performs as Implicit Regularization. Hossein Hosseini, Ramin Moslemi, Ali Hooshmand and Ratnesh Sharma 64- Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization. Navid Azizan, Sahin Lale and Babak Hassibi 65- Scaling Characteristics of Sequential Multitask Learning: Networks Naturally Learn to Learn. Guy Davidson and Michael Mozer 66- Size-free generalization bounds for convolutional neural networks. Phillip Long and Hanie Sedghi 🔗