Track: Probabilistic Methods 5

Thu 22 July 18:00 - 18:20 PDT

Oral

Outstanding Paper Honorable Mention

Solving high-dimensional parabolic PDEs using the tensor train format

Lorenz Richter · Leon Sallandt · Nikolas Nüsken

High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science and engineering. However, their numerical treatment poses formidable challenges since traditional grid-based methods tend to be frustrated by the curse of dimensionality. In this paper, we argue that tensor trains provide an appealing approximation framework for parabolic PDEs: the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation. Following this paradigm, we develop novel iterative schemes, involving either explicit and fast or implicit and accurate updates. We demonstrate in a number of examples that our methods achieve a favorable trade-off between accuracy and computational efficiency in comparison with state-of-the-art neural network based approaches.

Thu 22 July 18:20 - 18:25 PDT

Spotlight

Large Scale Private Learning via Low-rank Reparametrization

Da Yu · Huishuai Zhang · Wei Chen · Jian Yin · Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence. Specifically, we reparametrize each weight matrix with two \emph{gradient-carrier} matrices of small dimension and a \emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design \emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9\%$ on four downstream tasks with $\epsilon=8$ , which is within $5\%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.

Thu 22 July 18:25 - 18:30 PDT

Spotlight

Breaking the Deadly Triad with a Target Network

Shangtong Zhang · Hengshuai Yao · Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In this paper, we investigate the target network as a tool for breaking the deadly triad, providing theoretical support for the conventional wisdom that a target network stabilizes training. We first propose and analyze a novel target network update rule which augments the commonly used Polyak-averaging style update with two projections. We then apply the target network and ridge regularization in several divergent algorithms and show their convergence to regularized TD fixed points. Those algorithms are off-policy with linear function approximation and bootstrapping, spanning both policy evaluation and control, as well as both discounted and average-reward settings. In particular, we provide the first convergent linear $Q$ -learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.

Thu 22 July 18:30 - 18:35 PDT

Spotlight

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad (Sutton & Barto, 2018). To address the deadly triad, we propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting. In terms of estimating the differential value function, the algorithms are the first convergent off-policy linear function approximation algorithms. In terms of estimating the reward rate, the algorithms are the first convergent off-policy linear function approximation algorithms that do not require estimating the density ratio. We demonstrate empirically the advantage of the proposed algorithms, as well as their nonlinear variants, over a competitive density-ratio-based approach, in a simple domain as well as challenging robot simulation tasks.

Thu 22 July 18:35 - 18:40 PDT

Spotlight

Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games

Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang

We study the global convergence and global optimality of the actor-critic algorithm applied for the zero-sum two-player stochastic games in a decentralized manner. We focus on the single-timescale setting where the critic is updated by applying the Bellman operator only once and the actor is updated by policy gradient with the information from the critic. Our algorithm is in a decentralized manner, as we assume that each player has no access to the actions of the other one, which, in a way, protects the privacy of both players. Moreover, we consider linear function approximations for both actor and critic, and we prove that the sequence of joint policy generated by our decentralized linear algorithm converges to the minimax equilibrium at a sublinear rate (\cO(\sqrt{K})), where (K) is the number of iterations. To the best of our knowledge, we establish the global optimality and convergence of decentralized actor-critic algorithm on zero-sum two-player stochastic games with linear function approximations for the first time.

Thu 22 July 18:40 - 18:45 PDT

Spotlight

Optimal Non-Convex Exact Recovery in Stochastic Block Model via Projected Power Method

Peng Wang · Huikang Liu · Zirui Zhou · Anthony Man-Cho So

In this paper, we study the problem of exact community recovery in the symmetric stochastic block model, where a graph of $n$ vertices is randomly generated by partitioning the vertices into $K \ge 2$ equal-sized communities and then connecting each pair of vertices with probability that depends on their community memberships. Although the maximum-likelihood formulation of this problem is discrete and non-convex, we propose to tackle it directly using projected power iterations with an initialization that satisfies a partial recovery condition. Such an initialization can be obtained by a host of existing methods. We show that in the logarithmic degree regime of the considered problem, the proposed method can exactly recover the underlying communities at the information-theoretic limit. Moreover, with a qualified initialization, it runs in $\mO(n\log^2n/\log\log n)$ time, which is competitive with existing state-of-the-art methods. We also present numerical results of the proposed method to support and complement our theoretical development.

Thu 22 July 18:45 - 18:50 PDT

Spotlight

Optimal Counterfactual Explanations in Tree Ensembles

Axel Parmentier · Thibaut Vidal

Counterfactual explanations are usually generated through heuristics that are sensitive to the search's initial conditions. The absence of guarantees of performance and robustness hinders trustworthiness. In this paper, we take a disciplined approach towards counterfactual explanations for tree ensembles. We advocate for a model-based search aiming at "optimal" explanations and propose efficient mixed-integer programming approaches. We show that isolation forests can be modeled within our framework to focus the search on plausible explanations with a low outlier score. We provide comprehensive coverage of additional constraints that model important objectives, heterogeneous data types, structural constraints on the feature space, along with resource and actionability restrictions. Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. It scales up to large data sets and tree ensembles, where it provides, within seconds, systematic explanations grounded on well-defined models solved to optimality.

Thu 22 July 18:50 - 18:55 PDT

Q&A

Main Navigation

Session

Probabilistic Methods 5

Solving high-dimensional parabolic PDEs using the tensor train format

Large Scale Private Learning via Low-rank Reparametrization

Breaking the Deadly Triad with a Target Network

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games

Optimal Non-Convex Exact Recovery in Stochastic Block Model via Projected Power Method

Optimal Counterfactual Explanations in Tree Ensembles

Q&A