Acoustic space partition based on broad phonetic class for ensemble acoustic modeling

Author

Xugang Lu ; Yu Tsao ; Matsuda, Shodai ; Hori, Chiori ; Kashioka, Hideki

Author_Institution

Nat. Inst. of Inf. & Commun. Technol., Japan

fYear

2012

fDate

5-8 Dec. 2012

Firstpage

311

Lastpage

314

Abstract

Ensemble acoustic modeling can be used to model different factors that cause variability of acoustic space, and provide different combination to improve the performance of automatic speech recognition (ASR). One of the main concerns is how to partition the training data set to several subsets based on which ensemble models are trained. In this study, we focus on ensemble acoustic modeling concerned with acoustic variability caused by gender and accent for Chinese large vocabulary continuous speech recognition (LVCSR). Considering that gender and accent information may be encoded in local acoustic realizations of a few specific phonetic classes rather than in a global acoustic distribution, we proposed a acoustic space partition method based on broad phonetic class (BPC) modeling of speaker for ensemble acoustic modeling. With the principal component analysis (PCA) of the BPC based speaker representation, we designed two level hierarchical data partitions in the low dimensional speaker factor space that concerned with gender and accent information. Ensemble acoustic models were trained on the partitioned data sets on both levels. Speech recognition results showed that using acoustic models trained based on the first level and second level partitions got 9.73% and 32.29% relative improvements in character error reduction rate, respectively.

Keywords

acoustic signal processing; gender issues; learning (artificial intelligence); principal component analysis; speaker recognition; speech processing; vocabulary; ASR; BPC based speaker representation; Chinese large vocabulary continuous speech recognition; LVCSR; PCA; accent information; acoustic space partition method; acoustic variability; automatic speech recognition; broad phonetic class modeling; character error reduction rate; ensemble acoustic modeling; gender; local acoustic realization; low dimensional speaker factor space; principal component analysis; speaker modeling; subsets; training data partition; two level hierarchical data partitions; Abstracts; Data models; Decision support systems; Jacobian matrices; Speech; Speech recognition; Standards; Ensemble modeling; acoustic space partition; speaker clustering; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on

Conference_Location

Kowloon

Print_ISBN

978-1-4673-2506-6

Electronic_ISBN

978-1-4673-2505-9

Type

conf

DOI

10.1109/ISCSLP.2012.6423501

Filename

6423501