Skip to yearly menu bar Skip to main content


Keynote
in
Affinity Workshop: LatinX in AI (LXAI) LXAI Research Workshop

Speech Recognition & Synthesis for Language in Low Data Regimes: Learning from Few Speakers using Multilingual Models

Moacir Ponti


Abstract:

Speech recognition offers promising benefits for business and personal applications. Although automatic speech recognition systems have evolved significantly with deep learning methods, it remains an open research problem. In many languages there is still a shortage of open/public resources, resulting in low quality automatic speech recognition systems. In this talk, a multi-speaker text-to-speech (TTS) system is described for scenarios with few available speakers. Exploring flow-based and multilingual models, it is possible to leverage data from languages with many available speakers and make it viable for those languages with less data. Additionally, we show how this model can be applied to improve automatic speech recognition (ASR) systems in two target languages, simulating a scenario with only one speaker available. This allows for many applications such as the deployment of TTS for people with voice disorders, or ASR in extremely small data available such as native and regional languages.

Chat is not available.