Timezone: »

Revisiting Architecture-aware Knowledge Distillation: Smaller Models and Faster Search
Taehyeon Kim · Heesoo Myeong · Se-Young Yun

Sat Jul 23 07:00 AM (PDT) @

Knowledge Distillation (KD) has recently emerged as a popular method for compressing neural networks. In recent studies, generalized distillation methods that find parameters and architectures of student models at the same time have been proposed. Still, this search method requires a lot of computation to search for architectures and has the disadvantage of considering only convolutional blocks in their search space. This paper introduces a new algorithm, coined as Trust Region Aware architecture search to Distill knowledge Effectively (TRADE), that rapidly finds effective student architectures from several state-of-the-art architectures using trust region Bayesian optimization approach. Experimental results show our proposed TRADE algorithm consistently outperforms both the conventional NAS approach and pre-defined architecture under KD training.

Author Information

Taehyeon Kim (KAIST)

I’m a Ph.D. candidate in the Graduate School of AI at Korea Advanced Institute of Science and Technology (KAIST), advised by Prof. Se-Young Yun, and a member of OSI Lab. During my study, I interned at Qualcomm AI ADAS (Seoul, South Korea, 2021). I received a B.S. in Mathematics from KAIST in 2018. My research has investigated trustworthy and real-world AI/ML challenges. Specifically, my interests include the optimization for training deep neural networks, automated neural architecture search, automated hyperparameter search, learning with noisy labels, model compression, federated learning, and precipitation nowcasting. My research has been presented at several conferences and organizations.

Heesoo Myeong (Qualcomm Korea YH)
Se-Young Yun (KAIST)

More from the Same Authors