Plasticity Activation via Polar Operator: A Plug-in Method for Balancing Stability and Plasticity
Guodong Zheng ⋅ Enneng Yang ⋅ Xiaoyan Wang ⋅ Feihong He ⋅ Yihan Chen ⋅ Quan Zheng ⋅ Peng Wang ⋅ Li Shen
Abstract
Continual learning (CL) seeks models that acquire new knowledge while avoiding catastrophic forgetting. However, many methods that mitigate forgetting constrain parameter updates and thereby reduce model plasticity. We revisit the singular value spectrum of gradients in representative CL methods and show that they commonly exhibit singular value collapse, where only a small subset of gradient directions drive parameter updates. Motivated by this observation, we propose \textbf{P}lasticity \textbf{A}ctivation via \textbf{P}olar \textbf{O}perator (PAPO), a plug-in that preserves the dominant directions that mitigate forgetting while activating previously suppressed directions to enhance plasticity. Concretely, PAPO modifies the gradient $\mathbf{G}$ as $\mathbf{G}\leftarrow \mathbf{G}+\lambda \cdot \operatorname{polar}(\mathbf{G})$, which uniformly increases near-zero singular values without changing the singular vectors. To avoid the cost of explicit singular value decomposition, we approximate the polar factor using the iteration-dependent Polar Express scheme, which relies only on matrix multiplications and additions. In our empirical evaluation on both vision and language benchmarks, incorporating PAPO yields consistent improvements. In particular, on MiniImageNet, integrating PAPO into ER, MAS, GPM and TRGP produces substantial accuracy gains of $9.01\%$, $4.76\%$, $8.90\%$ and $9.19\%$, respectively.
Successful Page Load