Keywords: [ DL: Attention Mechanisms ] [ MISC: Scalable Algorithms ] [ MISC: Supervised Learning ] [ MISC: Online Learning, Active Learning and Bandits ] [ MISC: Sequential, Network, and Time Series Modeling ] [ MISC: Transfer, Multitask and Meta-learning ] [ DL: Algorithms ] [ DL: Graph Neural Networks ] [ DL: Robustness ] [ DL: Self-Supervised Learning ] [ DL: Sequential Models, Time series ]
Inspired by Lottery Ticket Hypothesis that competitive subnetworks exist within a dense network, we propose a continual learning method referred to as Winning SubNetworks (WSN), which sequentially learns and selects an optimal subnetwork for each task. Specifically, WSN jointly learns the model weights and task-adaptive binary masks pertaining to subnetworks associated with each task whilst attempting to select a small set of weights to be activated (winning ticket) by reusing weights of the prior subnetworks. The proposed method is inherently immune to catastrophic forgetting as each selected subnetwork model does not infringe upon other subnetworks. Binary masks spawned per winning ticket are encoded into one N-bit binary digit mask, then compressed using Huffman coding for a sub-linear increase in network capacity with respect to the number of tasks.