Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun · Randall Balestriero · Richard Baraniuk
Grokking, or {\em delayed generalization}, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings, such as training of a convolutional neural network (CNN) on CIFAR10 or a Resnet on Imagenette. We introduce the new concept of {\em delayed robustness}, whereby a DNN groks adversarial examples and becomes robust, long after interpolation and/or generalization. We develop an analytical explanation for the emergence of both delayed generalization and delayed robustness based on the {\em local complexity} of a DNN's input-output mapping. Our \textit{local complexity} measures the density of so-called ``linear regions’’ (aka, spline partition regions) that tile the DNN input space.