Demystifying When Pruning Works via Representation Hierarchies
Abstract
Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To demystify how such discrepancies arise under pruning, we analyze network pruning from a representation-hierarchy perspective, decomposing the internal computation of language models into three sequential spaces: \textit{embedding} (hidden representations), \textit{logit} (pre-softmax outputs), and \textit{probability} (post-softmax distributions). While representations in the embedding and logit spaces are largely robust to pruning-induced perturbations, the subsequent nonlinear transformation from logits to the probability space amplifies such deviations, whose persistence across time steps leads to substantial degradation during generation. By contrast, the stability of the categorical-token probability subspace, together with the robustness of the embedding space, supports the effectiveness of pruning for non-generative tasks such as retrieval and multiple-choice selection. Our representation-level analysis disentangles the effects of pruning across tasks and offers practical guidance on its application. Code will be released upon acceptance.