Keynote
in
Affinity Workshop: LatinX in AI (LXAI) LXAI Research Workshop
Speech Recognition & Synthesis for Language in Low Data Regimes: Learning from Few Speakers using Multilingual Models
Moacir Ponti
Speech recognition offers promising benefits for business and personal applications. Although automatic speech recognition systems have evolved significantly with deep learning methods, it remains an open research problem. In many languages there is still a shortage of open/public resources, resulting in low quality automatic speech recognition systems. In this talk, a multi-speaker text-to-speech (TTS) system is described for scenarios with few available speakers. Exploring flow-based and multilingual models, it is possible to leverage data from languages with many available speakers and make it viable for those languages with less data. Additionally, we show how this model can be applied to improve automatic speech recognition (ASR) systems in two target languages, simulating a scenario with only one speaker available. This allows for many applications such as the deployment of TTS for people with voice disorders, or ASR in extremely small data available such as native and regional languages.