Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

Jan Macdonald · Mathieu Besançon · Sebastian Pokutta

Hall E #936

Keywords: [ APP: Computer Vision ] [ OPT: First-order ] [ OPT: Non-Convex ] [ OPT: Stochastic ] [ DL: Everything Else ] [ SA: Accountability, Transparency and Interpretability ]

[ Abstract ]
[ Poster [ Paper PDF
Wed 20 Jul 3:30 p.m. PDT — 5:30 p.m. PDT
Spotlight presentation: Social Aspects/MISC
Wed 20 Jul 7:30 a.m. PDT — 9 a.m. PDT


We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE and other baseline methods in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.

Chat is not available.