Timezone: »
Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that, in both supervised and unsupervised CL, the transfer quality of representations does not show a noticeable degradation of fine-tuning performance but rather increases gradually. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.
Author Information
Jaehong Yoon (KAIST)
Sung Ju Hwang (UNIST)
Yue Cao (Beijing Academy of Artificial Intelligence)
More from the Same Authors
-
2021 : Entropy Weighted Adversarial Training »
Minseon Kim · Jihoon Tack · Jinwoo Shin · Sung Ju Hwang -
2021 : Consistency Regularization for Adversarial Robustness »
Jihoon Tack · Sihyun Yu · Jongheon Jeong · Minseon Kim · Sung Ju Hwang · Jinwoo Shin -
2023 : Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations »
Hyeonjeong Ha · Minseon Kim · Sung Ju Hwang -
2023 Poster: Personalized Subgraph Federated Learning »
Jinheon Baek · Wonyong Jeong · Jiongdao Jin · Jaehong Yoon · Sung Ju Hwang -
2023 Poster: Exploring Chemical Space with Score-based Out-of-distribution Generation »
Seul Lee · Jaehyeong Jo · Sung Ju Hwang -
2023 Poster: Revisiting Discriminative vs. Generative Classifiers: Theory and Implications »
Chenyu Zheng · Guoqiang Wu · Fan Bao · Yue Cao · Chongxuan Li · Jun Zhu -
2023 Poster: Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation »
Jeffrey Willette · Seanie Lee · Bruno Andreis · Kenji Kawaguchi · Juho Lee · Sung Ju Hwang -
2023 Poster: One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale »
Fan Bao · Shen Nie · Kaiwen Xue · Chongxuan Li · Shi Pu · Yaole Wang · Gang Yue · Yue Cao · Hang Su · Jun Zhu -
2023 Poster: Margin-based Neural Network Watermarking »
Byungjoo Kim · Suyoung Lee · Seanie Lee · Son · Sung Ju Hwang -
2022 Poster: Forget-free Continual Learning with Winning Subnetworks »
Haeyong Kang · Rusty Mina · Sultan Rizky Hikmawan Madjid · Jaehong Yoon · Mark Hasegawa-Johnson · Sung Ju Hwang · Chang Yoo -
2022 Poster: Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization »
Jaehong Yoon · Geon Park · Wonyong Jeong · Sung Ju Hwang -
2022 Spotlight: Forget-free Continual Learning with Winning Subnetworks »
Haeyong Kang · Rusty Mina · Sultan Rizky Hikmawan Madjid · Jaehong Yoon · Mark Hasegawa-Johnson · Sung Ju Hwang · Chang Yoo -
2022 Spotlight: Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization »
Jaehong Yoon · Geon Park · Wonyong Jeong · Sung Ju Hwang -
2021 Poster: Federated Continual Learning with Weighted Inter-client Transfer »
Jaehong Yoon · Wonyong Jeong · GiWoong Lee · Eunho Yang · Sung Ju Hwang -
2021 Spotlight: Federated Continual Learning with Weighted Inter-client Transfer »
Jaehong Yoon · Wonyong Jeong · GiWoong Lee · Eunho Yang · Sung Ju Hwang -
2017 Poster: Combined Group and Exclusive Sparsity for Deep Neural Networks »
jaehong yoon · Sung Ju Hwang -
2017 Talk: Combined Group and Exclusive Sparsity for Deep Neural Networks »
jaehong yoon · Sung Ju Hwang