Track: Interpretability

Thu 13 June 9:00 - 9:20 PDT

Neural Network Attributions: A Causal Perspective

Aditya Chattopadhyay · Piyushi Manupriya · Anirban Sarkar · Vineeth N Balasubramanian

We propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology to compute the causal effect of each feature on the output is presented. With reasonable assumptions on the causal structure of the input data, we propose algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality. We also show how this method can be used for recurrent neural networks. We report experimental results on both simulated and real datasets showcasing the promise and usefulness of the proposed algorithm.

Thu 13 June 9:20 - 9:25 PDT

Towards a Deep and Unified Understanding of Deep Neural Models in NLP

Chaoyu Guan · Xiting Wang · Quanshi Zhang · Runjin Chen · Di He · Xing Xie

We define a unified information-based measure to provide quantitative explanations on how intermediate layers of deep Natural Language Processing (NLP) models leverage the information of input words.Our method advances existing explanation methods by addressing their exhibited issues in coherency and generality. The explanations generated by using our method are consistent and faithful across different timestamps, layers, and models. We show how our method can be used to understand four widely used models in NLP and explain their performances on three real-world benchmark datasets.

Thu 13 June 9:25 - 9:30 PDT

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation

Marco Ancona · Cengiz Oztireli · Markus Gross

The problem of explaining the behavior of Deep Neural Networks has gained a lot of attention over the last years. While several attribution methods have been proposed, most methods are based on heuristics without clear strong theoretical foundations. This raises the question of whether the resulting attributions are reliable. On the other hand, the literature on cooperative game theory suggests Shapley Values as a unique way of assigning relevance scores such that certain desirable properties are satisfied. Previous works on attribution methods also showed that explanations based on Shapley Values better agree with the human intuition. Unfortunately, the exact evaluation of Shapley Values is prohibitively expensive, exponential in the number of input features. In this work, by leveraging recent results on uncertainty propagation, we propose a novel, polynomial-time approximation of Shapley Values in deep neural networks. We show that our method produces significantly better approximations of Shapley Values than existing state-of-the-art attribution methods.

Thu 13 June 9:30 - 9:35 PDT

Functional Transparency for Structured Data: a Game-Theoretic Approach

Guang-He Lee · Wengong Jin · David Alvarez-Melis · Tommi Jaakkola

We provide a new approach to training neural models to exhibit transparency in a well-defined, functional manner. Our approach naturally operates over structured data and tailors the predictor, functionally, towards a chosen family of (local) witnesses. The estimation problem is setup as a co-operative game between an unrestricted \emph{predictor} such as a neural network, and a set of \emph{witnesses} chosen from the desired transparent family. The goal of the witnesses is to highlight, locally, how well the predictor conforms to the chosen family of functions, while the predictor is trained to minimize the highlighted discrepancy. We emphasize that the predictor remains globally powerful as it is only encouraged to agree locally with locally adapted witnesses. We analyze the effect of the proposed approach, provide example formulations in the context of deep graph and sequence models, and empirically illustrate the idea in chemical property prediction, temporal modeling, and molecule representation learning.

Thu 13 June 9:35 - 9:40 PDT

Exploring interpretable LSTM neural networks over multi-variable data

Tian Guo · Tao Lin · Nino Antulov-Fantulin

For a recurrent neural network trained on time series with target and exogenous variables, in addition to accurate prediction, it is also desired to provide interpretable insights into the data. In this paper, we explore the structure of LSTM recurrent neural network to learn variable-wise hidden states, with the aim to capture different dynamics in multi-variable time series and distinguish the contribution of variables to the prediction. With these variable-wise hidden states, a mixture attention mechanism is proposed to model the generative process of the target. Then we develop the associated training method to learn network parameters, variable and temporal importance w.r.t the prediction of the target variable. Extensive experiments on real datasets demonstrate that by modeling dynamics of different variables, the prediction performance is enhanced. Meanwhile, we evaluate the interpretation results both qualitatively and quantitatively. It exhibits the prospect of the developed method as an end-to-end framework for both forecasting and knowledge extraction over multi-variable data.

Thu 13 June 9:40 - 10:00 PDT

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

Augustus Odena · Catherine Olsson · David Andersen · Ian Goodfellow

Neural networks are difficult to interpret and debug. We introduce testing techniques for neural networks that can discover errors occurring only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how approximate nearest neighbor (ANN) algorithms can provide this coverage metric for neural networks. We then combine these methods with techniques for property-based testing (PBT). In PBT, one asserts properties that a function should satisfy and the system automatically generates tests exercising those properties. We then apply this system to practical goals including (but not limited to) surfacing broken loss functions in popular GitHub repositories and making performance improvements to TensorFlow. Finally, we release an open source library called TensorFuzz that implements the described techniques.

Thu 13 June 10:00 - 10:05 PDT

Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute

Tong Wang

This work addresses the situation where a black-box model outperforms all its interpretable competitors. The existing solution to understanding the black-box is to use an explainer model to generate explanations, which can be ambiguous and inconsistent. We propose an alternative solution by finding an interpretable substitute on a subset of data where the black-box model is \emph{overkill} or nearly overkill and use this interpretable model to process this subset of data, leaving the rest to the black-box. This way, on this subset of data, the model gains complete interpretability and transparency to replace otherwise non-perfect approximations by an external explainer. This transparency is obtained at minimal cost or no cost of the predictive performance. Under this framework, we develop Partial Substitute Rules (PSR) model that uses decision rules to capture the subspace of data where the rules are as accurate or almost as accurate as the black-box provided. PSR is agnostic to the black-box model. To train a PSR, we devise an efficient search algorithm that iteratively finds the optimal model and exploits theoretically grounded strategies to reduce computation. Experiments on structured and text data show that PSR obtains an effective trade-off between transparency and interpretability.

Thu 13 June 10:05 - 10:10 PDT

State-Regularized Recurrent Neural Networks

Cheng Wang · Mathias Niepert

Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) nonregular languages such as balanced parentheses, palindromes, and the copy task where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition, and language modeling. We show that state-regularization (a) simplifies the extraction of finite state automata modeling an RNN's state transition dynamics; (b) forces RNNs to operate more like automata with external memory and less like finite state machines; (c) makes RNNs have better interpretability and explainability.

Thu 13 June 10:10 - 10:15 PDT

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation

Sahil Singla · Eric Wallace · Shi Feng · Soheil Feizi

Current methods to interpret deep learning models by generating saliency maps generally rely on two key assumptions. First, they use first-order approximations of the loss function neglecting higher-order terms such as the loss curvatures. Second, they evaluate each feature's importance in isolation, ignoring their inter-dependencies. In this work, we study the effect of relaxing these two assumptions. First, by characterizing a closed-form formula for the Hessian matrix of a deep ReLU network, we prove that, for a classification problem with a large number of classes, if an input has a high confidence classification score, the inclusion of the Hessian term has small impacts in the final solution. We prove this result by showing that in this case the Hessian matrix is approximately of rank one and its leading eigenvector is almost parallel to the gradient of the loss function. Our empirical experiments on ImageNet samples are consistent with our theory. This result can have implications in other related problems such as adversarial examples as well. Second, we compute the importance of group-features in deep learning interpretation by introducing a sparsity regularization term. We use the $L_0-L_1$ relaxation technique along with the proximal gradient descent to have an efficient computation of group feature importance scores. Our empirical results indicate that considering group features can improve deep learning interpretation significantly.

Thu 13 June 10:15 - 10:20 PDT

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Christian Etmann · Sebastian Lunz · Peter Maass · Carola-Bibiane Schönlieb

Recent studies on the adversarial vulnerability of neural networks have shown that models trained to be more robust to adversarial attacks exhibit more interpretable saliency maps than their non-robust counterparts. We aim to quantify this behaviour by considering the alignment between input image and saliency map. We hypothesize that as the distance to the decision boundary grows, so does the alignment. This connection is strictly true in the case of linear models. We confirm these theoretical findings with experiments based on models trained with a local Lipschitz regularization and identify where the nonlinear nature of neural networks weakens the relation.

Main Navigation

Session

Interpretability