Skip to yearly menu bar Skip to main content


Poster

Decomposing and Editing Predictions by Modeling the Computation Graph

Harshay Shah · Andrew Ilyas · Aleksander Madry


Abstract:

How does the internal computation of a machine learning model transform examples into predictions? We introduce a framework called component modeling for tackling this question by decomposing a prediction in terms of model components—simple functions that are the "building blocks" of model computation. We focus on a special case of this framework, the component attribution task, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then describe Coar, our method for estimating component attributions, and demonstrate its effectiveness for both vision and language models. Finally, we show that Coar attributions can directly enable effective model editing, allowing us to fix model errors, boost subpopulation robustness, and mitigate typographic attacks.

Live content is unavailable. Log in and register to view live content