Title :
Sub-structure-based estimation of pronunciation proficiency and classification of learners
Author :
Suzuki, Masayuki ; Minematsu, Nobuaki ; Luo, Dean ; Hirose, Keikichi
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
fDate :
Nov. 13 2009-Dec. 17 2009
Abstract :
Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.
Keywords :
classification; computer aided instruction; estimation theory; speaker recognition; speech processing; automatic estimation; input utterances; learner classification; linguistic factors; pronunciation proficiency estimation; speaker-invariant substructures; spectral envelope patterns; speech structure; structural analysis; vocal organs; Automatic control; Automatic speech recognition; Humans; Loudspeakers; Proposals; Robustness;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
DOI :
10.1109/ASRU.2009.5373275