Skip to yearly menu bar Skip to main content


Afternoon Poster
in
Workshop: Artificial Intelligence & Human Computer Interaction

Large Language Models as a Proxy For Human Evaluation in Assessing the Comprehensibility of Disordered Speech Transcription

Katrin Tomanek · Jimmy Tobin · Subhashini Venugopalan · Richard Cave · Katie Seaver · Rus Heywood · Jordan Green


Abstract:

Automatic Speech Recognition (ASR) systems, despite significant advances in recent years, still have much room for improvement particularly in the recognition of disordered speech. Even so, erroneous transcripts from ASR models can help people with disordered speech be better understood. Evaluating the efficacy of ASR for this use case requires a methodology for measuring the impact of transcription errors on the intended meaning and comprehensibility. Human evaluation is the gold standard for this, but it can be laborious, slow, and expensive. Here, we tuned and evaluated large language models (LLMs) and found them to be a better proxy for human evaluators compared to typical sentence similarity metrics. We further present a case-study of using our approach to make ASR model deployment decisions in a live video conversation setting.

Chat is not available.