مرکز منطقه ای اطلاع رساني علوم و فناوري - Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks

DocumentCode :

134351

Title :

Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks

Author :

Kun Li ; Meng, Hsiang-Yun

Author_Institution :

Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China

fYear :

2014

fDate :

12-14 Sept. 2014

Firstpage :

255

Lastpage :

259

Abstract :

This paper investigates the use of multi-distribution Deep Neural Networks (DNNs) for mispronunciation detection and diagnosis (MD&D). Our existing approach uses extended recognition networks (ERNs) to constrain the recognition paths to the canonical pronunciation of the target words and the likely phonetic mispronunciations. Although this approach is viable, it has some problems: (1) deriving appropriate phonological rules to generate the ERNs remains a challenging task; (2) the acoustic model (AM) and the phonological rules are trained independently and hence contextual information is lost; and (3) phones missing from the ERNs cannot be recognized even if we have a well-trained AM. Hence we propose an Acoustic Phonological Model (APM) using a multi-distribution DNN, whose input features include acoustic features and corresponding canonical pronunciations. The APM can implicitly learn the phonological rules from the canonical productions and annotated mispronunciations in the training data. Furthermore, the APM can also capture the relationships between the phonological rules and related acoustic features. As we do not restrict any pathways as in the ERNs, all phones can be recognized if we have a perfect APM. Experiments show that our method achieves an accuracy of 83.3% and a correctness of 88.5%. It significantly outperforms the approach of forced-alignment with ERNs whose correctness is 75.9%.

Keywords :

acoustic signal processing; neural nets; speech processing; speech recognition; APM; ERN; L2 English speech; MD&D; acoustic features; acoustic phonological model; canonical pronunciations; extended recognition networks; mispronunciation detection and diagnosis; multidistribution DNN; multidistribution deep neural networks; Accuracy; Acoustics; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; L2 English speech; deep neural networks; mispronunciation detection and diagnosis; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location :

Singapore

Type :

conf

DOI :

10.1109/ISCSLP.2014.6936724

Filename :

6936724

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=134351