Poster
Zero-Shot Knowledge Distillation in Deep Networks
Gaurav Kumar Nayak · Konda Reddy Mopuri · Vaisakh Shaj · Venkatesh Babu Radhakrishnan · Anirban Chakraborty
Keywords: [ Algorithms ] [ Architectures ] [ Computer Vision ] [ Deep Learning Theory ]
Knowledge distillation deals with the problem of training a smaller model (\emph{Student}) from a high capacity source model (\emph{Teacher}) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the \emph{Student}. However, accessing the dataset on which the \emph{Teacher} has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the \emph{Student} from the \emph{Teacher}. Without even using any meta-data, we synthesize the \emph{Data Impressions} from the complex \emph{Teacher} model and utilize these as surrogates for the original training data samples to transfer its learning to \emph{Student} via knowledge distillation. We, therefore, dub our method ``Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.