Track: Other Models and Methods 2

Fri 13 July 0:30 - 0:50 PDT

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Been Kim · Martin Wattenberg · Justin Gilmer · Carrie Cai · James Wexler · Fernanda Viégas · Rory sayres

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of ‚Äúzebra‚Äù is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

Fri 13 July 0:50 - 1:00 PDT

Learning equations for extrapolation and control

Subham S Sahoo · Christoph H. Lampert · Georg Martius

We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.

Fri 13 July 1:00 - 1:10 PDT

PDE-Net: Learning PDEs from Data

Zichao Long · Yiping Lu · Xianzhong Ma · Bin Dong

Partial differential equations (PDEs) play a prominent role in many disciplines of science and engineering. PDEs are commonly derived based on empirical observations. However, with the rapid development of sensors, computational power, and data storage in the past decade, huge quantities of data can be easily collected and efficiently stored. Such vast quantity of data offers new opportunities for data-driven discovery of physical laws. Inspired by the latest development of neural network designs in deep learning, we propose a new feed-forward deep network, called PDE-Net, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. Comparing with existing approaches, our approach has the most flexibility by learning both differential operators and the nonlinear response function of the underlying PDE model. A special feature of the proposed PDE-Net is that all filters are properly constrained, which enables us to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. These constrains are carefully designed by fully exploiting the relation between the orders of differential operators and the orders of sum rules of filters (an important concept originated from wavelet theory). Numerical experiments show that the PDE-Net has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.

Fri 13 July 1:10 - 1:20 PDT

Transformation Autoregressive Networks

Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider

The fundamental task of general density estimation $p(x)$ has been of keen interest to machine learning. In this work, we attempt to systematically characterize methods for density estimation. Broadly speaking, most of the existing methods can be categorized into either using: a) autoregressive models to estimate the conditional factors of the chain rule, $p(x_{i}\, |\, x_{i-1}, \ldots)$; or b) non-linear transformations of variables of a simple base distribution. To better study the characteristics of these categories we propose multiple methods for each category. For example we propose RNN based transformations to model non-Markovian transformation of variables. Further, through a comprehensive study over both real world and synthetic data, we show for that jointly leveraging transformations of variables and autoregressive conditional models, results in a considerable improvement in performance. We illustrate the use of our models in outlier detection and image modeling. Finally we introduce a novel data driven framework for learning a family of distributions.

Fri 13 July 1:20 - 1:30 PDT

Weightless: Lossy weight encoding for deep neural network compression

Brandon Reagen · Udit Gupta · Bob Adolf · Michael Mitzenmacher · Alexander Rush · Gu-Yeon Wei · David Brooks

The large memory requirements of deep neural networks limit their deployment and adoption on many devices.Model compression methods effectively reduce the memory requirements of these models,usually through applying transformations such as weight pruning or quantization.In this paper, we present a novel scheme for lossy weight encoding co-designed with weight simplification techniques.The encoding is based on the Bloomier filter,a probabilistic data structure that can save space at the cost of introducing random errors.Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors,the proposed technique, named Weightless, can compress weights by up to 496xwithout loss of model accuracy.This results in up to a 1.51x improvement over the state-of-the-art.

Main Navigation

Session

Other Models and Methods 2

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Learning equations for extrapolation and control

PDE-Net: Learning PDEs from Data

Transformation Autoregressive Networks

Weightless: Lossy weight encoding for deep neural network compression