Timezone: »

Does Continual Learning Equally Forget All Parameters?
Haiyan Zhao · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang

Fri Jul 22 08:00 AM -- 08:15 AM (PDT) @

Continual learning (CL) on neural networks suffers from catastrophic forgetting due to the distribution or task shift. In this paper, we study which parts of neural nets are more prone to forgetting by investigating their training dynamics during CL. We discover that only a few modules (e.g., batch-norm, last layer, earlier convolutional layers) are more task-specific and sensitively alters between tasks, while others can be shared across tasks as common knowledge. Hence, we attribute forgetting mainly to the former and find that finetuning them on only a small buffer at the end of any CL method can bring non-trivial improvement.Due to their few parameters, such Forgetting Prioritized Finetuning (FPF)'' is efficient and only requires a small buffer to retain the previous tasks.We further develop an even simpler replay-free method that applies FPF k-times during CL to replace the costly every-step replay. Surprisingly, thisk-FPF'' performs comparably to FPF and outperforms the state-of-the-art CL methods but significantly reduces their computational overhead and cost. In experiments on several benchmarks of class- and domain-incremental CL, FPF consistently improves existing CL methods by a large margin and k-FPF further excels on the efficiency without degrading the accuracy.

Author Information

Haiyan Zhao (University of Technology Sydney)
Tianyi Zhou (University of Washington)

Tianyi Zhou is currently a PhD student at Paul G. Allen school of Computer Science and Engineering, University of Washington. He is supervised by Prof. Jeff Bilmes and Prof. Carlos Guestrin. He published ~50 papers at NeurIPS, ICML, ICLR, AISTATS, NAACL, KDD, ICDM, IJCAI, AAAI, ISIT, Machine Learning Journal, IEEE TIP, IEEE TNNLS, IEEE TKDE, etc, with ~1700 citations. He is the recipient of the Best student paper award at ICDM 2013.

Guodong Long (University of Technology Sydney)
Jing Jiang (University of Technology Sydney)
Chengqi Zhang (University of Technology Sydney)

More from the Same Authors