Automatic Allophone Deriving for Korean Speech Recognition

Author

Ji Xu ; Yujing Si ; Jielin Pan ; Yonghong Yan

Author_Institution

Key Lab. of Speech Acoust. & Content Understanding, Beijing, China

fYear

2013

fDate

14-15 Dec. 2013

Firstpage

776

Lastpage

779

Abstract

In Korean, the pronunciations of phonemes are severely affected by their contexts. Thus, using phonemes directly translated from their written forms as basic units for acoustic modeling is problematic, as these units lack the ability to capture the complex pronunciation variations occurred in continuous speech. Allophone, a sub-phone unit in phonetics but served as independent phoneme in speech recognition, is considered to have the ability to describe complex pronunciation variations. This paper presents a novel approach called Automatic Allophone Deriving (AAD). In this approach, statistics from Gaussian Mixture Models are used to create measurements for allophone candidates, and decision trees are used to derive allophones. Question set used by the decision tree is also generated automatically, since we assumed no linguistic knowledge would be used in this approach. This paper also adopts long-time features over conventional cepstral features to capture acoustic information over several hundred milliseconds for AAD, as co-articulation effects are unlikely to be limited to a single phoneme. Experiment shows that AAD outperforms previous approaches which derive allophones from linguistic knowledge. Additional experiments use long-time features directly in acoustic modeling. The results show that performance improvement achieved by using the same allophones can be significantly improved by using long-time features, compared with corresponding baselines.

Keywords

Gaussian processes; acoustic signal processing; decision trees; natural language processing; speech recognition; AAD approach; Gaussian mixture models; Korean speech recognition; acoustic information capture; acoustic modeling; automatic allophone deriving approach; cepstral features; continuous speech; decision trees; linguistic knowledge; phoneme pronunciation; pronunciation variations; statistics; Acoustics; Clustering algorithms; Context; Decision trees; Hidden Markov models; Pragmatics; Speech recognition; Korean speech recognition; allophone; long-time features;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence and Security (CIS), 2013 9th International Conference on

Conference_Location

Leshan

Print_ISBN

978-1-4799-2548-3

Type

conf

DOI

10.1109/CIS.2013.169

Filename

6746537