Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Hallmarks of Optimization Trajectories in Neural Networks and LLMs: Directional Exploration and Redundancy

Sidak Pal Singh · Bobby He · Thomas Hofmann · Bernhard Schölkopf


Abstract:

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce a natural notion of the complexity of optimization trajectories which help hallmark the directional nature of optimization in neural networks: when is there redundancy, and when exploration. We utilize the trajectory perspective to showcase the effect of scale on regularizing the directional nature of trajectories. As a by-product, we also observe an intriguing heterogeneity of Q,K,V dynamics in the middle attention layers in LLMs which, however, is homogenized by scale. Importantly, we put the significant directional redundancy observed to the test by demonstrating that training only scalar batchnorm parameters some while into training matches the performance of training the entire network, and thus exhibiting the potential for hybrid optimization schemes geared towards efficiency.

Chat is not available.