Skip to yearly menu bar Skip to main content


Session

Deep Learning 3

Moderator: Tanuj Sur

Abstract:
Chat is not available.

Thu 22 July 20:30 - 20:35 PDT

Spotlight
Discretization Drift in Two-Player Games

Mihaela Rosca · Yan Wu · Benoit Dherin · David GT Barrett

Gradient-based methods for two-player games produce rich dynamics that can solve challenging problems, yet can be difficult to stabilize and understand. Part of this complexity originates from the discrete update steps given by simultaneous or alternating gradient descent, which causes each player to drift away from the continuous gradient flow -- a phenomenon we call discretization drift. Using backward error analysis, we derive modified continuous dynamical systems that closely follow the discrete dynamics. These modified dynamics provide an insight into the notorious challenges associated with zero-sum games, including Generative Adversarial Networks. In particular, we identify distinct components of the discretization drift that can alter performance and in some cases destabilize the game. Finally, quantifying discretization drift allows us to identify regularizers that explicitly cancel harmful forms of drift or strengthen beneficial forms of drift, and thus improve performance of GAN training.

Thu 22 July 20:35 - 20:40 PDT

Spotlight
Elementary superexpressive activations

Dmitry Yarotsky

We call a finite family of activation functions \emph{superexpressive} if any multivariate continuous function can be approximated by a neural network that uses these activations and has a fixed architecture only depending on the number of input variables (i.e., to achieve any accuracy we only need to adjust the weights, without increasing the number of neurons). Previously, it was known that superexpressive activations exist, but their form was quite complex. We give examples of very simple superexpressive families: for example, we prove that the family {sin, arcsin} is superexpressive. We also show that most practical activations (not involving periodic functions) are not superexpressive.

Thu 22 July 20:40 - 20:45 PDT

Spotlight
Regularizing towards Causal Invariance: Linear Models with Proxies

Michael Oberst · Nikolaj Thams · Jonas Peters · David Sontag

We propose a method for learning linear models whose predictive performance is robust to causal interventions on unobserved variables, when noisy proxies of those variables are available. Our approach takes the form of a regularization term that trades off between in-distribution performance and robustness to interventions. Under the assumption of a linear structural causal model, we show that a single proxy can be used to create estimators that are prediction optimal under interventions of bounded strength. This strength depends on the magnitude of the measurement noise in the proxy, which is, in general, not identifiable. In the case of two proxy variables, we propose a modified estimator that is prediction optimal under interventions up to a known strength. We further show how to extend these estimators to scenarios where additional information about the "test time" intervention is available during training. We evaluate our theoretical findings in synthetic experiments and using real data of hourly pollution levels across several cities in China.

Thu 22 July 20:45 - 20:50 PDT

Spotlight
A Language for Counterfactual Generative Models

Zenna Tavares · James Koppel · Xin Zhang · Ria Das · Armando Solar-Lezama

We present Omega, a probabilistic programming language with support for counterfactual inference. Counterfactual inference means to observe some fact in the present, and infer what would have happened had some past intervention been taken, e.g. ``given that medication was not effective at dose x, what is the probability that it would have been effective at dose 2x?.'' We accomplish this by introducing a new operator to probabilistic programming akin to Pearl's do, define its formal semantics, provide an implementation, and demonstrate its utility through examples in a variety of simulation models.

Thu 22 July 20:50 - 20:55 PDT

Spotlight
How rotational invariance of common kernels prevents generalization in high dimensions

Konstantin Donhauser · Mingqi Wu · Fanny Yang

Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for high-dimensional kernel regression for a number of specific assumptions on the data distribution. In this paper, we show that in high dimensions, the rotational invariance property of commonly studied kernels (such as RBF, inner product kernels and fully-connected NTK of any depth) leads to inconsistent estimation unless the ground truth is a low-degree polynomial. Our lower bound on the generalization error holds for a wide range of distributions and kernels with different eigenvalue decays. This lower bound suggests that consistency results for kernel ridge regression in high dimensions generally require a more refined analysis that depends on the structure of the kernel beyond its eigenvalue decay.

Thu 22 July 20:55 - 21:00 PDT

Q&A
Q&A