Skip to yearly menu bar Skip to main content


Poster

Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Mohammed Adnan · Rohan Jain · Ekansh Sharma · Rahul G. Krishnan · Yani Ioannou

East Exhibition Hall A-B #E-2106
[ ] [ ] [ Project Page ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights thatachieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10/100 & ImageNet) and models (VGG11 & ResNet20/50).

Lay Summary:

Modern artificial intelligence (AI) systems are incredibly powerful but often require massive amounts of computing power and data to train. This makes them expensive and out of reach for many researchers and developers. To address this, scientists have been exploring “sparser” AI models—systems that use only a small fraction of their potential connections—making them much more efficient to train and run.However, a major hurdle is that a sparse model setup that works well with one starting point for training often fails when training begins from a different starting point. Our research identifies the root cause: misalignment. Think of it like using a key (the sparse setup) on a lock that has been rotated slightly—it just doesn’t fit.To solve this, we developed a method to “re-align” the sparse structure so it matches the patterns of a new starting point. This adjustment dramatically improves the performance of sparse models trained from different starting points, making them nearly as effective as their original versions.Our findings make it easier and more practical to develop leaner, more efficient AI systems, paving the way for broader accessibility and innovation in AI research.

Live content is unavailable. Log in and register to view live content