Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)
Decomposed Learning: An Avenue for Mitigating Grokking
Gabryel Mason-Williams · Israel Mason-Williams
Keywords: [ compression ] [ grokking ] [ SVD ] [ linear algebra ] [ optimisation ]
Abstract:
Grokking is a delayed transition from memorisation to generalisation in neural networks. It challenges perspectives on efficient learning, particularly in structured tasks and small-data regimes. We explore grokking in modular arithmetic from the perspective of a training pathology. We use Singular Value Decomposition (SVD) to modify the weight matrices of neural networks by changing the representation of the weight matrix, $W$, into the product of three matrices, $U$, $\Sigma$ and $V^T$. Through empirical evaluations on the modular addition task, we show that this representation significantly reduces the effect of grokking and, in some cases, eliminates it.
Chat is not available.
Successful Page Load