Skip to yearly menu bar Skip to main content


IBM

Expo Demonstration

Excitement-Driven AI Sports Commentary Generation

Yang Zhang

West Exhibition Hall A-B1
[ ]
Mon 14 Jul 4 p.m. PDT — 8 p.m. PDT

Abstract:

Speech language models refer to language models with speech processing and understanding capabilities. One key desirable capability for speech language models is the ability to capture the intricate interdependency between content and prosody, which many existing works fail to accomplish satisfactorily. We propose a speech language model that explicitly represents the prosody information and its relationship with text and thus is surprisingly capable of generating expressive speech appropriate to the context.

In this demo, we combine our speech modeling technology with multi-modal language models into an expressive AI sports commentary generation system. The system analyzes tennis game videos and generates expressive play-by-play speech commentary. Notably, the system can detect the excitement level of the play from crowd and player reactions and adjust the excitement level of the generated speech accordingly.

Live content is unavailable. Log in and register to view live content