Timezone: »

Trainable Decoding of Sets of Sequences for Neural Sequence Models
Ashwin Kalyan · Peter Anderson · Stefan Lee · Dhruv Batra

Thu Jun 13 12:05 PM -- 12:10 PM (PDT) @ Hall B
Many structured prediction tasks admit multiple correct outputs and so, it is often useful to decode a set of outputs that maximize some task-specific set-level metric. However, retooling standard sequence prediction procedures tailored towards predicting the single best output leads to the decoding of sets containing very similar sequences; failing to capture the variation in the output space. To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric. Our method tightly integrates the training and decoding phases and further allows for the optimization of the task-specific metric addressing the shortcomings of standard sequence prediction. Further, we discuss the trade-offs of commonly used set-level metrics and motivate a new set-level metric that naturally evaluates the notion of ``capturing the variation in the output space''. Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.

Author Information

Ashwin Kalyan (Georgia Tech)
Peter Anderson (Georgia Tech)
Stefan Lee (Georgia Institute of Technology)
Dhruv Batra (Georgia Institute of Technology / Facebook AI Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors