Oral
A fully differentiable beam search decoder
Ronan Collobert · Awni Hannun · Gabriel Synnaeve

Thu Jun 13th 10:15 -- 10:20 AM @ Room 201

We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operates at different granularity (e.g. acoustic and language models). It also handles an arbitrary number of target sequence candidates, making it suitable in a context where labeled data is not aligned to input sequences. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms are powerful enough to successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an \emph{explicit} and possibly pre-trained language model.

Author Information

Ronan Collobert (Facebook AI Research)
Awni Hannun
Gabriel Synnaeve (Facebook AI Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors