A Robust Optimization Guided Pruning Framework for Vision and Large Language Models
Abstract
Pruning is a common approach to reduce the memory footprint and inference cost of large vision and language models. As these architectures continue to scale, one-shot pruning methods - i.e. approaches that prune the network without any retraining - have become increasingly attractive. Many popular one-shot pruning methods (e.g., WoodFisher, CAP, SparseGPT, and ALPS) typically optimize a quadratic objective under sparsity constraints. However, in practice, this objective is affected by multiple sources of uncertainty, including noise in the calibration data and variability introduced by algorithmic updates. To address these issues, we introduce RobOP, a robust optimization framework that explicitly accounts for such uncertainties. RobOP is modular and flexible, and can be applied with any existing pruning method through simple modifications motivated by our theoretical framework. We demonstrate that by taking into account uncertainty, RobOP offers improvements over prior pruning approaches. Our framework applies tractably across a range of stylized uncertainty sets, enabling robust one-shot pruning at scale.