Skip to yearly menu bar Skip to main content


Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning

Decomposing and Editing Predictions by Modeling Model Computation

Harshay Shah · Andrew Ilyas · Aleksander Madry


Abstract:

How does the internal computation of a machine learning model transform inputs into predictions? To tackle this question, we introduce a framework called component modeling for decomposing a model prediction in terms of its components---architectural "building blocks" such as convolution filters or attention heads. We focus on a special case of this framework, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions, and demonstrate its effectiveness across models, datasets and modalities. Finally, we show that COAR directly enables effective model editing.

Chat is not available.