ICML 2019 Zero-Shot Knowledge Distillation in Deep Networks Oral

Oral

Zero-Shot Knowledge Distillation in Deep Networks

Gaurav Kumar Nayak · Konda Reddy Mopuri · Vaisakh Shaj · Venkatesh Babu Radhakrishnan · Anirban Chakraborty

[ Abstract ] [ Visit Deep Learning Theory ]

[ Slides] [ Oral]

Abstract:

Knowledge distillation deals with the problem of training a smaller model from a high capacity model so as to retain most of its performance. The source and target model are generally referred to as Teacher and Student model respectively. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., biometric or medical data). Therefore, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even utilizing any meta-data, we extract the Data Impressions from the parameters of the Teacher model and utilize these as surrogate for the original training data samples to transfer its learning to Student via knowledge distillation. Hence we dub our method "Zero-shot Knowledge Distillation". We demonstrate that our framework results in competitive generalization performance as achieved by the actual training data samples on multiple benchmark datasets.

Chat is not available.