Timezone: »
Modern meta-learning approaches for image classification rely on increasingly deep networks to achieve state-of-the-art performance, making batch normalization an essential component of meta-learning pipelines. However, the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective, giving rise to the need to rethink normalization in this setting. We evaluate a range of approaches to batch normalization for meta-learning scenarios, and develop a novel approach that we call TaskNorm. Experiments on fourteen datasets demonstrate that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient based- and gradient-free meta-learning approaches. Importantly, TaskNorm is found to consistently improve performance. Finally, we provide a set of best practices for normalization that will allow fair comparison of meta-learning algorithms.
Author Information
John Bronskill (University of Cambridge)
Jonathan Gordon (University of Cambridge)
James Requeima (University of Cambridge)
Sebastian Nowozin (Microsoft Research)
Richard E Turner (University of Cambridge)
Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in Computer Vision and Machine Learning in the Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, UK. He is a Fellow of Christ's College Cambridge. Previously, he held an EPSRC Postdoctoral research fellowship which he spent at both the University of Cambridge and the Laboratory for Computational Vision, NYU, USA. He has a PhD degree in Computational Neuroscience and Machine Learning from the Gatsby Computational Neuroscience Unit, UCL, UK and a M.Sci. degree in Natural Sciences (specialism Physics) from the University of Cambridge, UK. His research interests include machine learning, signal processing and developing probabilistic models of perception.
More from the Same Authors
-
2021 : Attacking Few-Shot Classifiers with Adversarial Support Poisoning »
Elre Oldewage · John Bronskill · Richard E Turner -
2023 : Beyond Intuition, a Framework for Applying GPs to Real-World Data »
Kenza Tazi · Jihao Andreas Lin · ST John · Hong Ge · Richard E Turner · Ross Viljoen · Alex Gardner -
2023 : Modeling Accurate Long Rollouts with Temporal Neural PDE Solvers »
Phillip Lippe · Bastiaan Veeling · Paris Perdikaris · Richard E Turner · Johannes Brandstetter -
2021 Workshop: Uncertainty and Robustness in Deep Learning »
Balaji Lakshminarayanan · Dan Hendrycks · Sharon Li · Jasper Snoek · Silvia Chiappa · Sebastian Nowozin · Thomas Dietterich -
2020 Poster: The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks »
Jakub Swiatkowski · Kevin Roth · Bastiaan Veeling · Linh Tran · Joshua V Dillon · Jasper Snoek · Stephan Mandt · Tim Salimans · Rodolphe Jenatton · Sebastian Nowozin -
2020 Poster: Scalable Exact Inference in Multi-Output Gaussian Processes »
Wessel Bruinsma · Eric Perim Martins · William Tebbutt · Scott Hosking · Arno Solin · Richard E Turner -
2020 Poster: How Good is the Bayes Posterior in Deep Neural Networks Really? »
Florian Wenzel · Kevin Roth · Bastiaan Veeling · Jakub Swiatkowski · Linh Tran · Stephan Mandt · Jasper Snoek · Tim Salimans · Rodolphe Jenatton · Sebastian Nowozin -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Poster: EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE »
Chao Ma · Sebastian Tschiatschek · Konstantina Palla · Jose Miguel Hernandez-Lobato · Sebastian Nowozin · Cheng Zhang -
2019 Oral: EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE »
Chao Ma · Sebastian Tschiatschek · Konstantina Palla · Jose Miguel Hernandez-Lobato · Sebastian Nowozin · Cheng Zhang -
2018 Poster: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Oral: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Poster: Which Training Methods for GANs do actually Converge? »
Lars Mescheder · Andreas Geiger · Sebastian Nowozin -
2018 Poster: Structured Evolution with Compact Architectures for Scalable Policy Optimization »
Krzysztof Choromanski · Mark Rowland · Vikas Sindhwani · Richard E Turner · Adrian Weller -
2018 Oral: Which Training Methods for GANs do actually Converge? »
Lars Mescheder · Andreas Geiger · Sebastian Nowozin -
2018 Oral: Structured Evolution with Compact Architectures for Scalable Policy Optimization »
Krzysztof Choromanski · Mark Rowland · Vikas Sindhwani · Richard E Turner · Adrian Weller -
2017 Poster: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Poster: Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space »
Jose Miguel Hernandez-Lobato · James Requeima · Edward Pyzer-Knapp · Alan Aspuru-Guzik -
2017 Talk: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Talk: Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space »
Jose Miguel Hernandez-Lobato · James Requeima · Edward Pyzer-Knapp · Alan Aspuru-Guzik -
2017 Poster: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Talk: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Poster: Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks »
Lars Mescheder · Sebastian Nowozin · Andreas Geiger -
2017 Talk: Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks »
Lars Mescheder · Sebastian Nowozin · Andreas Geiger