DocumentCode
110579
Title
Automatic Pronunciation Scoring with Score Combination by Learning to Rank and Class-Normalized DP-Based Quantization
Author
Liang-Yu Chen ; Jang, Jyh-Shing Roger
Author_Institution
Inst. of Inf. Syst. & Applic., Nat. Tsing Hua Univ., Hsinchu, Taiwan
Volume
23
Issue
11
fYear
2015
fDate
Nov. 2015
Firstpage
1737
Lastpage
1749
Abstract
This paper proposes an automatic pronunciation scoring framework using learning to rank and class-normalized, dynamic-programming-based quantization. The goal is to train a model that is able to grade the pronunciation of a second language learner, such that the predicted score is as close as possible to the one given by a human teacher. Under this framework, each utterance is given a score of 1 to 5 by human raters, which is treated as a ground truth rank for the training algorithm. The corpus was rated by qualified English teachers in Taiwan (nonnative speakers). Nine phone-level scores are computed and converted into word-level scores through four conversion methods. We select the 16 best performing scores as the input features to train the learning-to-rank function. The output of the function is then quantized to a discrete rank on a 1-5 scale. The quantization is done with class normalization to alleviate the problem of data imbalance over different classes. Experimental results show that the proposed framework achieves a higher correlation to the human scores than other methods, along with higher accuracy in detecting instances of mispronunciation. We also release a new version of our nonnative corpus with human rankings.
Keywords
computer based training; dynamic programming; learning (artificial intelligence); natural language processing; Taiwan; automatic pronunciation scoring; class-normalized DP-based quantization; dynamic-programming-based quantization; ground truth rank; learning; phone-level score; score combination; training algorithm; word-level score; Correlation; Hidden Markov models; IEEE transactions; Quantization (signal); Speech; Speech processing; Training; Automatic pronunciation scoring; computer assisted language learning (CALL); computer assisted pronunciation training (CAPT); learning to rank;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2449089
Filename
7131475
Link To Document