DocumentCode :
683699
Title :
Automatic Allophone Deriving for Korean Speech Recognition
Author :
Ji Xu ; Yujing Si ; Jielin Pan ; Yonghong Yan
Author_Institution :
Key Lab. of Speech Acoust. & Content Understanding, Beijing, China
fYear :
2013
fDate :
14-15 Dec. 2013
Firstpage :
776
Lastpage :
779
Abstract :
In Korean, the pronunciations of phonemes are severely affected by their contexts. Thus, using phonemes directly translated from their written forms as basic units for acoustic modeling is problematic, as these units lack the ability to capture the complex pronunciation variations occurred in continuous speech. Allophone, a sub-phone unit in phonetics but served as independent phoneme in speech recognition, is considered to have the ability to describe complex pronunciation variations. This paper presents a novel approach called Automatic Allophone Deriving (AAD). In this approach, statistics from Gaussian Mixture Models are used to create measurements for allophone candidates, and decision trees are used to derive allophones. Question set used by the decision tree is also generated automatically, since we assumed no linguistic knowledge would be used in this approach. This paper also adopts long-time features over conventional cepstral features to capture acoustic information over several hundred milliseconds for AAD, as co-articulation effects are unlikely to be limited to a single phoneme. Experiment shows that AAD outperforms previous approaches which derive allophones from linguistic knowledge. Additional experiments use long-time features directly in acoustic modeling. The results show that performance improvement achieved by using the same allophones can be significantly improved by using long-time features, compared with corresponding baselines.
Keywords :
Gaussian processes; acoustic signal processing; decision trees; natural language processing; speech recognition; AAD approach; Gaussian mixture models; Korean speech recognition; acoustic information capture; acoustic modeling; automatic allophone deriving approach; cepstral features; continuous speech; decision trees; linguistic knowledge; phoneme pronunciation; pronunciation variations; statistics; Acoustics; Clustering algorithms; Context; Decision trees; Hidden Markov models; Pragmatics; Speech recognition; Korean speech recognition; allophone; long-time features;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Security (CIS), 2013 9th International Conference on
Conference_Location :
Leshan
Print_ISBN :
978-1-4799-2548-3
Type :
conf
DOI :
10.1109/CIS.2013.169
Filename :
6746537
Link To Document :
بازگشت