Timezone: »
Knowledge Distillation (KD) consists of transferring ``knowledge'' from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student's compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating the effect of the teacher outputs on both predicted and non-predicted classes.
Author Information
Tommaso Furlanello (University of Southern California)
Zachary Lipton (Carnegie Mellon University)
Michael Tschannen (ETH Zurich)
Laurent Itti (University of Southern California)
Anima Anandkumar (Amazon)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Born Again Neural Networks »
Fri. Jul 13th 04:15 -- 07:00 PM Room Hall B #106
More from the Same Authors
-
2021 : Do You See What I See? A Comparison of Radiologist Eye Gaze to Computer Vision Saliency Maps for Chest X-ray Classification »
Jesse Kim · Helen Zhou · Zachary Lipton -
2022 : Domain Adaptation under Open Set Label Shift »
Saurabh Garg · Sivaraman Balakrishnan · Zachary Lipton -
2022 : Unsupervised Learning under Latent Label Shift »
Pranav Mani · Manley Roberts · Saurabh Garg · Zachary Lipton -
2022 : Characterizing Datapoints via Second-Split Forgetting »
Pratyush Maini · Saurabh Garg · Zachary Lipton · Zico Kolter -
2022 : Counterfactual Metrics for Auditing Black-Box Recommender Systems for Ethical Concerns »
Nil-Jana Akpinar · Liu Leqi · Dylan Hadfield-Menell · Zachary Lipton -
2022 : RiskyZoo: A Library for Risk-Sensitive Supervised Learning »
William Wong · Audrey Huang · Liu Leqi · Kamyar Azizzadenesheli · Zachary Lipton -
2023 : Lightweight Learner for Shared Knowledge Lifelong Learning »
Yunhao Ge · Yuecheng Li · Di Wu · Ao Xu · Adam Jones · Amanda Rios · Iordanis Fostiropoulos · shixian wen · Po-Hsuan Huang · Zachary W. Murdock · Gozde Sahin · Shuo Ni · Kiran Lekkala · Sumedh Sontakke · Laurent Itti -
2023 : Model-tuning Via Prompts Makes NLP Models Adversarially Robust »
Mrigank Raman · Pratyush Maini · Zico Kolter · Zachary Lipton · Danish Pruthi -
2023 : Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift »
Saurabh Garg · Amrith Setlur · Zachary Lipton · Sivaraman Balakrishnan · Virginia Smith · Aditi Raghunathan -
2023 : Deep Equilibrium Based Neural Operators for Steady-State PDEs »
Tanya Marwah · Ashwini Pokle · Zico Kolter · Zachary Lipton · Jianfeng Lu · Andrej Risteski -
2023 : How to Cope with Gradual Data Drift? »
Rasool Fakoor · Jonas Mueller · Zachary Lipton · Pratik Chaudhari · Alex Smola -
2023 : TMARS: Improving Visual Representations by Circumventing Text Feature Learning »
Pratyush Maini · Sachin Goyal · Zachary Lipton · Zico Kolter · Aditi Raghunathan -
2023 : Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models »
Yunhao Ge · Jie Ren · Jiaping Zhao · Kaifeng Chen · Andrew Gallagher · Laurent Itti · Balaji Lakshminarayanan -
2023 : Identifying Inequity in Treatment Allocation »
Yewon Byun · Dylan Sam · Zachary Lipton · Bryan Wilder -
2023 : Conditional Diffusion Replay for Continual Learning in Medical Settings »
Yewon Byun · Saurabh Garg · Sanket Vaibhav Mehta · Praveer Singh · Jayashree Kalpathy-cramer · Bryan Wilder · Zachary Lipton -
2023 : SCIS 2023 Panel, The Future of Generalization: Scale, Safety and Beyond »
Maggie Makar · Samuel Bowman · Zachary Lipton · Adam Gleave -
2023 Poster: Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective »
Tanya Marwah · Zachary Lipton · Jianfeng Lu · Andrej Risteski -
2023 Poster: Can Neural Network Memorization Be Localized? »
Pratyush Maini · Michael Mozer · Hanie Sedghi · Zachary Lipton · Zico Kolter · Chiyuan Zhang -
2023 Poster: RLSbench: Domain Adaptation Under Relaxed Label Shift »
Saurabh Garg · Nick Erickson · University of California James Sharpnack · Alex Smola · Sivaraman Balakrishnan · Zachary Lipton -
2023 Poster: CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets »
Zachary Novack · Julian McAuley · Zachary Lipton · Saurabh Garg -
2022 Workshop: Principles of Distribution Shift (PODS) »
Elan Rosenfeld · Saurabh Garg · Shibani Santurkar · Jamie Morgenstern · Hossein Mobahi · Zachary Lipton · Andrej Risteski -
2022 Poster: Supervised Learning with General Risk Functionals »
Liu Leqi · Audrey Huang · Zachary Lipton · Kamyar Azizzadenesheli -
2022 Spotlight: Supervised Learning with General Risk Functionals »
Liu Leqi · Audrey Huang · Zachary Lipton · Kamyar Azizzadenesheli -
2021 : RL Explainability & Interpretability Panel »
Ofra Amir · Finale Doshi-Velez · Alan Fern · Zachary Lipton · Omer Gottesman · Niranjani Prasad -
2021 Poster: Correcting Exposure Bias for Link Recommendation »
Shantanu Gupta · Hao Wang · Zachary Lipton · Yuyang Wang -
2021 Spotlight: Correcting Exposure Bias for Link Recommendation »
Shantanu Gupta · Hao Wang · Zachary Lipton · Yuyang Wang -
2021 Poster: RATT: Leveraging Unlabeled Data to Guarantee Generalization »
Saurabh Garg · Sivaraman Balakrishnan · Zico Kolter · Zachary Lipton -
2021 Oral: RATT: Leveraging Unlabeled Data to Guarantee Generalization »
Saurabh Garg · Sivaraman Balakrishnan · Zico Kolter · Zachary Lipton -
2021 Poster: On Proximal Policy Optimization's Heavy-tailed Gradients »
Saurabh Garg · Joshua Zhanson · Emilio Parisotto · Adarsh Prasad · Zico Kolter · Zachary Lipton · Sivaraman Balakrishnan · Ruslan Salakhutdinov · Pradeep Ravikumar -
2021 Poster: Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning »
Sumedh Sontakke · Arash Mehrjou · Laurent Itti · Bernhard Schölkopf -
2021 Spotlight: On Proximal Policy Optimization's Heavy-tailed Gradients »
Saurabh Garg · Joshua Zhanson · Emilio Parisotto · Adarsh Prasad · Zico Kolter · Zachary Lipton · Sivaraman Balakrishnan · Ruslan Salakhutdinov · Pradeep Ravikumar -
2021 Spotlight: Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning »
Sumedh Sontakke · Arash Mehrjou · Laurent Itti · Bernhard Schölkopf -
2020 Poster: Uncertainty-Aware Lookahead Factor Models for Quantitative Investing »
Lakshay Chauhan · John Alberg · Zachary Lipton -
2019 Poster: Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment »
Yifan Wu · Ezra Winston · Divyansh Kaushik · Zachary Lipton -
2019 Poster: What is the Effect of Importance Weighting in Deep Learning? »
Jonathon Byrd · Zachary Lipton -
2019 Oral: Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment »
Yifan Wu · Ezra Winston · Divyansh Kaushik · Zachary Lipton -
2019 Oral: What is the Effect of Importance Weighting in Deep Learning? »
Jonathon Byrd · Zachary Lipton -
2019 Poster: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2018 Poster: Detecting and Correcting for Label Shift with Black Box Predictors »
Zachary Lipton · Yu-Xiang Wang · Alexander Smola -
2018 Poster: StrassenNets: Deep Learning with a Multiplication Budget »
Michael Tschannen · Aran Khanna · Animashree Anandkumar -
2018 Oral: StrassenNets: Deep Learning with a Multiplication Budget »
Michael Tschannen · Aran Khanna · Animashree Anandkumar -
2018 Oral: Detecting and Correcting for Label Shift with Black Box Predictors »
Zachary Lipton · Yu-Xiang Wang · Alexander Smola