Timezone: »

 
Talk
A Closer Look at Memorization in Deep Networks
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien

Mon Aug 07 10:30 PM -- 10:48 PM (PDT) @ Darling Harbour Theatre

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs.~real data. We also demonstrate that for appropriately tuned explicit regularization (e.g.,~dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

Author Information

David Krueger (MILA)
Yoshua Bengio (U. Montreal)

Yoshua Bengio is recognized as one of the world’s leading experts in artificial intelligence and a pioneer in deep learning. Since 1993, he has been a professor in the Department of Computer Science and Operational Research at the Université de Montréal. He is the founder and scientific director of Mila, the Quebec Institute of Artificial Intelligence, the world’s largest university-based research group in deep learning. He is a member of the NeurIPS board and co-founder and general chair for the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains and is Fellow of the same institution. In 2018, Yoshua Bengio ranked as the computer scientist with the most new citations, worldwide, thanks to his many publications. In 2019, he received the ACM A.M. Turing Award, “the Nobel Prize of Computing”, jointly with Geoffrey Hinton and Yann LeCun for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing. In 2020 he was nominated Fellow of the Royal Society of London.

Stanislaw Jastrzebski (Jagiellonian University)
Maxinder S. Kanwal (UC Berkeley)
Nicolas Ballas (Université de Montréal)
Asja Fischer (Computer Science Department, University of Bonn)
Emmanuel Bengio (McGill University)
Devansh Arpit
Tegan Maharaj
Aaron Courville (University of Montreal)
Simon Lacoste-Julien (University of Montreal)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors

  • 2021 : Gradient Starvation: A Learning Proclivity in Neural Networks »
    Mohammad Pezeshki · Sékou-Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie
  • 2022 Poster: Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks »
    Nan Wu · Stanislaw Jastrzebski · Kyunghyun Cho · Krzysztof J Geras
  • 2022 Spotlight: Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks »
    Nan Wu · Stanislaw Jastrzebski · Kyunghyun Cho · Krzysztof J Geras
  • 2021 Poster: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
    Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras
  • 2021 Spotlight: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
    Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras
  • 2019 : Poster discussion »
    Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari
  • 2019 : Networking Lunch (provided) + Poster Session »
    Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki
  • 2019 : Poster Session 1 (all papers) »
    Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · wenwu zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel
  • 2019 Poster: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
    Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer
  • 2019 Poster: Parameter-Efficient Transfer Learning for NLP »
    Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly
  • 2019 Poster: On the Spectral Bias of Neural Networks »
    Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville
  • 2019 Oral: On the Spectral Bias of Neural Networks »
    Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville
  • 2019 Oral: Parameter-Efficient Transfer Learning for NLP »
    Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly
  • 2019 Oral: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
    Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer
  • 2019 Poster: Manifold Mixup: Better Representations by Interpolating Hidden States »
    Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio
  • 2019 Poster: GMNN: Graph Markov Neural Networks »
    Meng Qu · Yoshua Bengio · Jian Tang
  • 2019 Oral: GMNN: Graph Markov Neural Networks »
    Meng Qu · Yoshua Bengio · Jian Tang
  • 2019 Oral: Manifold Mixup: Better Representations by Interpolating Hidden States »
    Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio
  • 2018 Poster: Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data »
    Amjad Almahairi · Sai Rajeswar · Alessandro Sordoni · Philip Bachman · Aaron Courville
  • 2018 Poster: Mutual Information Neural Estimation »
    Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville
  • 2018 Oral: Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data »
    Amjad Almahairi · Sai Rajeswar · Alessandro Sordoni · Philip Bachman · Aaron Courville
  • 2018 Oral: Mutual Information Neural Estimation »
    Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville
  • 2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
    Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal
  • 2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
    Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal
  • 2017 Workshop: Reproducibility in Machine Learning Research »
    Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio
  • 2017 Poster: Sharp Minima Can Generalize For Deep Nets »
    Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio
  • 2017 Talk: Sharp Minima Can Generalize For Deep Nets »
    Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio