Skip to yearly menu bar Skip to main content


Trainable Decoding of Sets of Sequences for Neural Sequence Models

Ashwin Kalyan · Peter Anderson · Stefan Lee · Dhruv Batra

Pacific Ballroom #48

Keywords: [ Structured Prediction ] [ Deep Sequence Models ]

Abstract: Many sequence prediction tasks admit multiple correct outputs and so, it is often useful to decode a set of outputs that maximize some task-specific set-level metric. However, retooling standard sequence prediction procedures tailored towards predicting the single best output leads to the decoding of sets containing very similar sequences; failing to capture the variation in the output space. To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric. Our method tightly integrates the training and decoding phases and further allows for the optimization of the task-specific metric addressing the shortcomings of standard sequence prediction. Further, we discuss the trade-offs of commonly used set-level metrics and motivate a new set-level metric that naturally evaluates the notion of ``capturing the variation in the output space''. Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.

Live content is unavailable. Log in and register to view live content