Skip to yearly menu bar Skip to main content

Workshop Poster
Workshop: ICML 2021 Workshop on Computational Biology

Statistical correction of input gradients for black box models trained with categorical input features

Antonio Majdandzic


Gradients of a model's prediction with respect to the inputs are used in a variety of downstream analyses for deep neural networks (DNNs). Examples include post hoc explanations with attribution methods. In many tasks, DNNs are trained on categorical input features subject to value constraints - a notable example is DNA sequences, where input values are subject to a probabilistic simplex constraint from the 1-hot encoded data. Here we observe that outside of this simplex, where no data points anchor the function during training, the learned function can exhibit erratic behaviors. Thus, the gradients can have arbitrary directions away from the data simplex, which manifests as noise in gradients. This can introduce significant errors to downstream applications that rely on input gradients, such as attribution maps. We introduce a simple correction for this off-simplex-derived noise and demonstrate its effectiveness quantitatively and qualitatively for DNNs trained on regulatory genomics data. We find that our correction consistently leads to a small, but significant improvement in gradient-based attribution scores, especially when the direction of the gradients deviates significantly from the simplex.

Chat is not available.