DocumentCode :
2769138
Title :
Non-native pronunciation variation modeling using an indirect data driven method
Author :
Kim, Mina ; Oh, Yoo Rhee ; Kim, Hong Kook
Author_Institution :
Gwangju Inst. of Sci. & Technol., Gwangju
fYear :
2007
fDate :
9-13 Dec. 2007
Firstpage :
231
Lastpage :
236
Abstract :
In this paper, we propose a pronunciation variation modeling method for improving the performance of a non-native automatic speech recognition (ASR) system that does not degrade the performance of a native ASR system. The proposed method is based on an indirect data-driven approach, where pronunciation variability is investigated from the training speech data, and variant rules are subsequently derived and applied to compensate for variability in the ASR pronunciation dictionary. To this end, native utterances are first recognized by using a phoneme recognizer, and then the variant phoneme patterns of native speech are obtained by aligning the recognized and reference phonetic sequences. The reference sequences are transcribed by using each of canonical, knowledge-based, and hand-labeled methods. Similar to non-native speech, the variant phoneme patterns of non-native speech can also be obtained by recognizing non-native utterances and comparing the recognized phoneme sequences and reference phonetic transcriptions. Finally, variant rules are derived from native and non-native variant phoneme patterns using decision trees and applied to the adaptation of a dictionary for non-native and native ASR systems. In this paper, Korean spoken by Chinese native speakers is considered as the non-native speech. It is shown from non-native ASR experiments that an ASR system using the dictionary constructed by the proposed pronunciation variation modeling method can relatively reduce the average word error rate (WER) by 18.5% when compared to the baseline ASR system using a canonical transcribed dictionary. In addition, the WER of a native ASR system using the proposed dictionary is also relatively reduced by 1.1%, as compared to the baseline native ASR system with a canonical constructed dictionary.
Keywords :
decision trees; dictionaries; learning (artificial intelligence); natural languages; speaker recognition; vocabulary; ASR pronunciation dictionary; Chinese native speakers; automatic speech recognition system; decision trees; indirect data driven method; native speech phoneme pattern; nonnative pronunciation variation modeling; phoneme recognizer; reference phonetic sequence; word error rate; Adaptation model; Automatic speech recognition; Decision trees; Degradation; Dictionaries; Error analysis; Pattern recognition; Speech recognition; Training data; Vocabulary; Speech recognition; dictionary adaptation; indirect data-driven approach; non-native speech recognition; pronunciation variation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
Type :
conf
DOI :
10.1109/ASRU.2007.4430114
Filename :
4430114
Link To Document :
بازگشت