Timezone: »

STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
Yucheng Lu · Shivani Agrawal · Suvinay Subramanian · Oleg Rybakov · Chris De Sa · Amir Yazdanbakhsh

Wed Jul 26 02:00 PM -- 03:30 PM (PDT) @ Exhibit Hall 1 #409

Recent innovations on hardware (e.g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference. However, state-of-the-art learning recipes in this regime (e.g. SR-STE) are proposed for non-adaptive optimizers like momentum SGD, while incurring non-trivial accuracy drop for Adam-trained models like attention-based LLMs. In this paper, we first demonstrate such gap origins from poorly estimated second moment (i.e. variance) in Adam states given by the masked weights. We conjecture that learning N:M masks with Adam should take the critical regime of variance estimation into account. In light of this, we propose STEP, an Adam-aware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (precondition phase) and subsequently, the variance remains fixed and is used as a precondition to learn N:M masks (mask-learning phase). STEP automatically identifies the switching point of two phases by dynamically sampling variance changes over the training trajectory and testing the sample concentration. Empirically, we evaluate STEP and other baselines such as ASP and SR-STE on multiple tasks including CIFAR classification, machine translation and LLM fine-tuning (BERT-Base, GPT-2). We show STEP mitigates the accuracy drop of baseline recipes and is robust to aggressive structured sparsity ratios.

Author Information

Yucheng Lu (Cornell University)
Shivani Agrawal (Google)
Suvinay Subramanian (Google)
Suvinay Subramanian

[Suvinay Subramanian](http://suvinay.com) is a computer architect at Google building hardware systems (TPUs) to accelerate machine learning and AI. His expertise is in hardware-software codesign, and sparsity in deep neural networks. He received a Ph.D. from MIT and a B.Tech from IIT Madras. He also co-hosts the [Computer Architecture Podcast](https://comparchpodcast.podbean.com/).

Oleg Rybakov (Google)
Chris De Sa (Cornell)
Amir Yazdanbakhsh (Google DeepMind)
Amir Yazdanbakhsh

My name is Amir Yazdanbakhsh. I joined Google Research as a Research Scientist in 2019, following a one year AI residency. I am the co-founder and co-lead of the Machine Learning for Computer Architecture team. We leverage the recent machine learning methods and advancements to innovate and design better hardware accelerators. The work of our team has been covered by media outlets including WIRED, ZDNet, AnalyticsInsight, InfoQ. I am also interested in designing large-scale distributed systems for training machine learning applications. To that end, I led the development of a massively large-scale distributed reinforcement learning system that scales to TPU Pod and efficiently manages thousands of actors to solve complex, real-world tasks. As a case study, our team demonstrates how using this highly scalable system enables reinforcement learning to accomplish chip placement in ~an hour instead of days or weeks by human effort. I received my Ph.D. degree in computer science from the Georgia Institute of Technology. My Ph.D. work has been recognized by various awards, including Microsoft PhD Fellowship and Qualcomm Innovation Fellowship.

More from the Same Authors