DocumentCode
683699
Title
Automatic Allophone Deriving for Korean Speech Recognition
Author
Ji Xu ; Yujing Si ; Jielin Pan ; Yonghong Yan
Author_Institution
Key Lab. of Speech Acoust. & Content Understanding, Beijing, China
fYear
2013
fDate
14-15 Dec. 2013
Firstpage
776
Lastpage
779
Abstract
In Korean, the pronunciations of phonemes are severely affected by their contexts. Thus, using phonemes directly translated from their written forms as basic units for acoustic modeling is problematic, as these units lack the ability to capture the complex pronunciation variations occurred in continuous speech. Allophone, a sub-phone unit in phonetics but served as independent phoneme in speech recognition, is considered to have the ability to describe complex pronunciation variations. This paper presents a novel approach called Automatic Allophone Deriving (AAD). In this approach, statistics from Gaussian Mixture Models are used to create measurements for allophone candidates, and decision trees are used to derive allophones. Question set used by the decision tree is also generated automatically, since we assumed no linguistic knowledge would be used in this approach. This paper also adopts long-time features over conventional cepstral features to capture acoustic information over several hundred milliseconds for AAD, as co-articulation effects are unlikely to be limited to a single phoneme. Experiment shows that AAD outperforms previous approaches which derive allophones from linguistic knowledge. Additional experiments use long-time features directly in acoustic modeling. The results show that performance improvement achieved by using the same allophones can be significantly improved by using long-time features, compared with corresponding baselines.
Keywords
Gaussian processes; acoustic signal processing; decision trees; natural language processing; speech recognition; AAD approach; Gaussian mixture models; Korean speech recognition; acoustic information capture; acoustic modeling; automatic allophone deriving approach; cepstral features; continuous speech; decision trees; linguistic knowledge; phoneme pronunciation; pronunciation variations; statistics; Acoustics; Clustering algorithms; Context; Decision trees; Hidden Markov models; Pragmatics; Speech recognition; Korean speech recognition; allophone; long-time features;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security (CIS), 2013 9th International Conference on
Conference_Location
Leshan
Print_ISBN
978-1-4799-2548-3
Type
conf
DOI
10.1109/CIS.2013.169
Filename
6746537
Link To Document