Hidden in Plain Sight -- Class Competition Focuses Attribution Maps
Nils Philipp Walter ⋅ Jilles Vreeken ⋅ Jonas Fischer
Abstract
Attribution methods reveal which input features a neural network uses for a prediction, adding transparency to their decisions. A common problem is that these attributions seem unspecific, highlighting both important and irrelevant features. We revisit the common attribution pipeline and observe that using logits as attribution target is a main cause of this phenomenon. We show that the solution is in plain sight: considering distributions of attributions over multiple classes using existing attribution methods yields specific and fine-grained attributions. On common benchmarks, including the grid-pointing game and randomization-based sanity checks, this improves the ability of 18 attribution methods across 7 architectures up to $2\times$, agnostic to model architecture.
Successful Page Load