Abstract :
In this paper, we propose a Neural Network (NN) based, Logistic Regression (LR) classifier for improving phone mispronunciation detection rate in a Computer-Aided Language Learning (CALL) system. A general neural network with multiple hidden layers for extracting useful speech features is first trained with pooled, training data, and then phone-dependent, 2-class logistic regression classifiers are trained as individual, phoneme specific nodes at the output layer. This new NN-based classifier with shared hidden layers streamlines the time-consuming work needed in training multiple individual classifiers separately, i.e., one for a specific phoneme, and learns common feature representation via the shared hidden layers. Its improved performance, when compared with independently trained, phoneme specific classifiers, is verified on a testing database of isolated English words recorded by non-native English learners. Compared with the conventional Goodness of Pronunciation (GOP)-based approach, the NN-based LR classifier improves the precision and recall by 37.1% and 11.7% (absolute), respectively. On the same test data, it also outperforms a Support Vector Machine (SVM)-based classifier, which is widely used for mispronunciation detection, and at a slightly better precision rate, the recall is improved by 10.6% (absolute) and the relative improvement is 21.6%.
Keywords :
feature extraction; neural nets; regression analysis; signal classification; speech recognition; 2-class logistic regression classifiers; CALL system; English words; GOP-based approach; L2 language learners; NN-based LR classifier; SVM-based classifier; computer-aided language learning; feature representation; goodness-of-pronunciation; mispronunciation detection; neural network based logistic regression classifier; speech feature extraction; support vector machine; Acoustics; Artificial neural networks; Hidden Markov models; Logistics; Support vector machines; Training; CALL; Deep Neural Network; Logistic Regression; Mispronunciation Detection;