Preconditioning Neural Tangent Kernel for Adaptive Optimization
Abstract
The Neural Tangent Kernel is a theoretical framework for understanding the training dynamics of neural networks. However, standard NTK and its variants fail to properly depict the finetuning of foundation models, as they neglect the preconditioning effects of adaptive gradients. To bridge this gap, we propose the Optimizer Aware Kernel (OAK), which incorporates the optimizer's influence into standard NTK framework by a preconditioner estimation technique. Furthermore, we conduct an analysis to answer: when and why kernel regime fails in finetuning. We derive explicit error bounds showing that the collapse of kernel regime is primarily due to the cumulative training effects and the task discrepancy between pretraining and finetuning. Theoretically, we justify OAK's preconditioner estimation by bounding its error term. Empirically, experiments on various model architectures show both the effectiveness of the OAK method and validity of our arguments on kernel regime collapse.