Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression
Akira Sakai ⋅ Yuma Ichikawa
Abstract
Sub-bit model compression seeks storage below one bit per weight, where the sign bit becomes a fixed-cost bottleneck as magnitudes are aggressively compressed. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank compression and are spectrally indistinguishable from i.i.d. Rademacher baselines. Despite this apparent randomness, most weights keep their initialization signs, with flips occurring mainly through rare near-zero boundary crossings, **suggesting that the randomness in sign patterns is largely inherited from initialization.** We formalize this behavior with *sign lock-in theory*, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition for a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer that reduces the effective flip rate to approximately $10^{-3}$ with only about a one-point increase in perplexity.
Successful Page Load