Skip to yearly menu bar Skip to main content


Session

Structured Prediction 2

Abstract:
Chat is not available.

Thu 12 July 2:00 - 2:20 PDT

Learning Maximum-A-Posteriori Perturbation Models for Structured Prediction in Polynomial Time

Asish Ghoshal · Jean Honorio

MAP perturbation models have emerged as a powerful framework for inference in structured prediction. Such models provide a way to efficiently sample from the Gibbs distribution and facilitate predictions that are robust to random noise. In this paper, we propose a provably polynomial time randomized algorithm for learning the parameters of perturbed MAP predictors. Our approach is based on minimizing a novel Rademacher-based generalization bound on the expected loss of a perturbed MAP predictor, which can be computed in polynomial time. We obtain conditions under which our randomized learning algorithm can guarantee generalization to unseen examples.

Thu 12 July 2:20 - 2:40 PDT

Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch · Mathieu Blondel

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.

Thu 12 July 2:40 - 2:50 PDT

Structured Output Learning with Abstention: Application to Accurate Opinion Prediction

Alexandre Garcia · Telecom-ParisTech ChloĆ© Clavel · Slim Essid · Florence d'Alche-Buc

Motivated by Supervised Opinion Analysis, we propose a novel framework devoted to Structured Output Learning with Abstention (SOLA). The structure prediction model is able to abstain from predicting some labels in the structured output at a cost chosen by the user in a flexible way. For that purpose, we decompose the problem into the learning of a pair of predictors, one devoted to structured abstention and the other, to structured output prediction. To compare fully labeled training data with predictions potentially containing abstentions, we define a wide class of asymmetric abstention-aware losses. Learning is achieved by surrogate regression in an appropriate feature space while prediction with abstention is performed by solving a new pre-image problem. Thus, SOLA extends recent ideas about Structured Output Prediction via surrogate problems and calibration theory and enjoys statistical guarantees on the resulting excess risk. Instantiated on a hierarchical abstention-aware loss, SOLA is shown to be relevant for fine-grained opinion mining and gives state-of-the-art results on this task. Moreover, the abstention-aware representations can be used to competitively predict user-review ratings based on a sentence-level opinion predictor.

Thu 12 July 2:50 - 3:00 PDT

End-to-End Learning for the Deep Multivariate Probit Model

Di Chen · Yexiang Xue · Carla Gomes

The multivariate probit model (MVP) is a popular classic model for studying binary responses of multiple entities. Nevertheless, the computational challenge of learning the MVP model, given that its likelihood involves integrating over a multidimensional constrained space of latent variables, significantly limits its application in practice. We propose a flexible deep generalization of the classic MVP, the Deep Multivariate Probit Model (DMVP), which is an end-to-end learning scheme that uses an efficient parallel sampling process of the multivariate probit model to exploit GPU-boosted deep neural networks. We present both theoretical and empirical analysis of the convergence behavior of DMVP's sampling process with respect to the resolution of the correlation structure. We provide convergence guarantees for DMVP and our empirical analysis demonstrates the advantages of DMVP's sampling compared with standard MCMC-based methods. We also show that when applied to multi-entity modelling problems, which are natural DMVP applications, DMVP trains faster than classical MVP, by at least an order of magnitude, captures rich correlations among entities, and further improves the joint likelihood of entities compared with several competitive models.