Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact

Optimizing Machine Learning Explanations for Properties

Hiwot Belay Tadesse · Yaniv Yacoby · Weiwei Pan · Finale Doshi-Velez


Abstract:

There are explanation methods, as well as works that quantify the extent to which these explanations satisfy properties, like faithfulness or robustness. For instance, SmoothGrad \cite{smilkovsmoothgrad2017}encourages robustness by averaging gradients around an input, whereas LIME \cite{ribeirowhy2016}encourages fidelity by fitting a linear approximation of a function. However, we demonstrate that these forms of encouragement do not consistently target their desired properties. In this paper, we \emph{directly optimize} explanations for desired properties. We show that, compared to SmoothGrad and LIME, we are able to: (1) produce explanations that are more optimal with respect to chosen properties (2) manage trade-offs between properties more explicitly and intuitively.

Chat is not available.