Timezone: »

Continual Learners are Incremental Model Generalizers
Jaehong Yoon · Sung Ju Hwang · Yue Cao

Wed Jul 26 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #100

Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that, in both supervised and unsupervised CL, the transfer quality of representations does not show a noticeable degradation of fine-tuning performance but rather increases gradually. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.

Author Information

Jaehong Yoon (KAIST)
Sung Ju Hwang (UNIST)
Yue Cao (Beijing Academy of Artificial Intelligence)

More from the Same Authors