ICML Why Pruning and Conditional Computation Work: A High-Dimensional Perspective

Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Why Pruning and Conditional Computation Work: A High-Dimensional Perspective

Erdem Koyuncu

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We analyze the processes of pruning and conditional computation for the case of a single neuron in the asymptotic learning regime of large input dimension and training set size. For this purpose, we introduce conditional neurons, which implement an early exit strategy at the neuron level. Specifically, a conditional neuron considers the local field induced by a subset of its inputs. If this sub-local field is strong enough, then the rest of the inputs are ignored, saving computation. Conditional neurons provide an archetype of the well-known early exit or conditional computation architectures. As such, we formally analyze their generalization performance to understand why conditional computation is so effective in preserving performance despite significantly reduced average amount of computation. In the process, we introduce a concentration theorem for one-shot neuron-wise pruning, which is recently popularized in the context of large language models.

Chat is not available.

Poster in Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Why Pruning and Conditional Computation Work: A High-Dimensional Perspective

Erdem Koyuncu

Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning