Invited Talk
in
Workshop: Learning to Generate Natural Language

Andre Martins: Beyond Softmax: Sparsemax, Constrained Softmax, Differentiable Easy-First

2017 Invited Talk
in
Workshop: Learning to Generate Natural Language

Abstract

In the first part of the talk, I will propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, I will show how its Jacobian can be efficiently computed, enabling its use in a network trained with backpropagation. Then, I will propose a new smooth and convex loss function which is the sparsemax analogue of the logistic loss, revealing an unexpected connection with the Huber classification loss. I will show promising empirical results in multi-label classification problems and in attention-based neural networks for natural language inference.

In the second part, I will introduce constrained softmax, another activation function that allows imposing upper bound constraints on attention probabilities. Based on this activation, I will introduce a novel neural end-to-end differentiable easy-first decoder that learns to solve sequence tagging tasks in a flexible order. The decoder iteratively updates a "sketch" of the predictions over the sequence. The proposed models compare favourably to BILSTM taggers on three sequence tagging tasks.

This is joint work with Ramon Astudillo and Julia Kreutzer.

Chat is not available.