Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
Abstract
Assessing spoken language is challenging, and quantifying pronunciation metrics for machine learning models is even harder. However, for the Holy Quran, this task is enabled by the rigorous recitation rules (Tajweed) established through the efforts of Muslim scholars, making highly effective assessment possible. Despite this advantage, the scarcity of high-quality annotated data remains a significant barrier. In this work, we bridge these gaps by introducing: (1) A 98\% automated pipeline to produce high-quality Quranic datasets -- encompassing collection of recitations from expert reciters, segmentation at pause points (waqf) using our fine-tuned wav2vec2-BERT model, transcription of segments, and transcript verification via our novel Tasmeea algorithm; (2) 848 hours of audio (286K annotated utterances); (3) qdat_bench, a benchmark covering phonemes, diacritization, and Tajweed rules (Ghunnah, Qalqalah, Madd) on real recitation errors containing 159 samples; (4) A novel ASR-based approach for pronunciation error detection utilizing our custom Quran Phonetic Script (QPS) to encode Tajweed rules (unlike the IPA standard for Modern Standard Arabic). QPS uses an 11-level script: phoneme level (encoding Arabic letters with short/long vowels) and sifat level (encoding articulation characteristics of every phoneme). We further present comprehensive modeling with our novel multi-level CTC model, which achieved 0.21\% and 1.94\% average Phoneme Error Rate (PER) on the test set and qdat_bench respectively, with a 75.8\% Tajweed F1 score. We release our work as open-source: \url{https://obadx.github.io/quran-muaalem/en/}