Understanding, Not Memorising: Hypernetwork-Based Weight Compression for Low-Resource Edge Deployment
Abstract
Deploying machine learning models on severely resource-constrained edge hardware remains a central challenge for communities in the Global South, where cloud connectivity is unreliable or prohibitively expensive. Existing compression methods such as pruning and quantisation shrink models by discarding or coarsening information, analogous to handing a student a smaller, blurrier photocopy of a textbook: recall is brittle and the full artefact is still required. We propose HyperCompress, a weight-space autoencoder built on a small hypernetwork that understands a pre-trained 1D convolutional neural network (CNN) by encoding its entire parameter set into a compact latent code. At inference time, only the 128-byte latent code and a lightweight shared decoder need reside on the device; full weight tensors are regenerated on demand. On a bearing-fault classification benchmark, HyperCompress achieves a 4,000x storage reduction from 512 KB to 128 B with less than 0.5 percent accuracy drop, outperforming pruning and INT8 quantisation at identical storage budgets. We release code and compressed model artefacts to lower barriers for resource-aware ML research in low-resource settings.