مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs

DocumentCode :

763665

Title :

Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs

Author :

Wu, Chung-Hsien ; Chiu, Yu-Hsien ; Shia, Chi-Jiun ; Lin, Chun-Yu

Author_Institution :

Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan

Volume :

Issue :

fYear :

2006

Firstpage :

266

Lastpage :

276

Abstract :

This paper proposes an approach to segmenting and identifying mixed-language speech. A delta Bayesian information criterion (delta-BIC) is firstly applied to segment the input speech utterance into a sequence of language-dependent segments using acoustic features. A VQ-based bi-gram model is used to characterize the acoustic-phonetic dynamics of two consecutive codewords in a language. Accordingly the language-specific acoustic-phonetic property of sequence of phones was integrated in the identification process. A Gaussian mixture model (GMM) is used to model codeword occurrence vectors orthonormally transformed using latent semantic analysis (LSA) for each language-dependent segment. A filtering method is used to smooth the hypothesized language sequence and thus eliminate noise-like components of the detected language sequence generated by the maximum likelihood estimation. Finally, a dynamic programming method is used to determine globally the language boundaries. Experimental results show that for Mandarin, English, and Taiwanese, a recall rate of 0.87 for language boundary segmentation was obtained. Based on this recall rate, the proposed approach achieved language identification accuracies of 92.1% and 74.9% for single-language and mixed-language speech, respectively.

Keywords :

Bayes methods; Gaussian processes; dynamic programming; filtering theory; maximum likelihood estimation; natural languages; speech intelligibility; speech processing; Gaussian mixture model; acoustic-phonetic dynamics; automatic segmentation; bigram model; codeword occurrence vectors; delta Bayesian information criterion; dynamic programming; filtering method; language boundary segmentation; language codewords; latent semantic analysis; maximum likelihood estimation; mixed-language speech identification; Application software; Bayesian methods; Dynamic programming; Filtering; Maximum likelihood detection; Maximum likelihood estimation; Natural languages; Noise generators; Principal component analysis; Speech analysis; Gaussian mixture model; language identification; latent semantic analysis; mixed-language speech; single-language speech;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TSA.2005.852992

Filename :

1561283

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=763665