Timezone: »
This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky min-degree solutions. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.
Author Information
Emmanuel Abbe
Samy Bengio (Apple MLR)
Aryo Lotfi (EPFL)
Kevin Rizk (EPFL - EPF Lausanne)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Thu. Jul 27th 12:00 -- 01:30 AM Room Exhibit Hall 1 #433
More from the Same Authors
-
2023 : Panel on Reasoning Capabilities of LLMs »
Guy Van den Broeck · Ishita Dasgupta · Subbarao Kambhampati · Jiajun Wu · Xi Victoria Lin · Samy Bengio · Beliz Gunel -
2023 : Generalization on the Unseen, Logic Reasoning and Degree Curriculum »
Samy Bengio -
2023 Panel: The Societal Impacts of AI »
Sanmi Koyejo · Samy Bengio · Ashia Wilson · Kirikowhai Mikaere · Joelle Pineau -
2021 Poster: Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels »
Eran Malach · Pritish Kamath · Emmanuel Abbe · Nati Srebro -
2021 Spotlight: Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels »
Eran Malach · Pritish Kamath · Emmanuel Abbe · Nati Srebro -
2020 Affinity Workshop: New In ML »
Zhen Xu · Sparkle Russell-Puleri · Zhengying Liu · Sinead A Williamson · Matthias W Seeger · Wei-Wei Tu · Samy Bengio · Isabelle Guyon -
2019 Workshop: Identifying and Understanding Deep Learning Phenomena »
Hanie Sedghi · Samy Bengio · Kenji Hata · Aleksander Madry · Ari Morcos · Behnam Neyshabur · Maithra Raghu · Ali Rahimi · Ludwig Schmidt · Ying Xiao -
2019 Poster: Area Attention »
Yang Li · Lukasz Kaiser · Samy Bengio · Si Si -
2019 Oral: Area Attention »
Yang Li · Lukasz Kaiser · Samy Bengio · Si Si -
2018 Poster: Fast Decoding in Sequence Models Using Discrete Latent Variables »
Lukasz Kaiser · Samy Bengio · Aurko Roy · Ashish Vaswani · Niki Parmar · Jakob Uszkoreit · Noam Shazeer -
2018 Oral: Fast Decoding in Sequence Models Using Discrete Latent Variables »
Lukasz Kaiser · Samy Bengio · Aurko Roy · Ashish Vaswani · Niki Parmar · Jakob Uszkoreit · Noam Shazeer -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio