Title :
Progress on Mandarin conversational telephone speech recognition
Author :
Hwang, Mei-Yuh ; Lei, Xin ; Ng, Tim ; Bulyko, Ivan ; Ostendorf, Man ; Stolcke, Andreas ; Wen Wang ; Zheng, Jing ; Gadde, V.R.R. ; Graciarena, Martin ; Siu, Man-Hung ; Huang, Yan
Author_Institution :
Washington Univ., St. Louis, MO, USA
Abstract :
Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-dependent issues such as tone. We show the impact of each of the following factors: (a) simplified Mandarin phone set; (b) pitch features; (c) auto-retrieved Web texts for augmenting n-gram training; (d) speaker adaptive training; (e) maximum mutual information estimation; (f) decision-tree-based parameter sharing; (g) cross-word co-articulation modeling; and (h) combining MFCC and PLP decoding outputs using confusion networks. We have reduced the Chinese character error rate (CER) of the BBN-2003 development test set from 53.8% to 46.8% after (a)+(b)+(c)+(f)+(g) are combined. Further reduction in CER is anticipated after integrating all improvements.
Keywords :
decision trees; error statistics; feature extraction; parameter estimation; speaker recognition; speech processing; BBN-2003 development test set; Chinese character error rate; MFCC; Mandarin CTS; Mandarin speech; PLP decoding outputs; auto-retrieved Web texts; confusion networks; conversational telephone speech recognition; cross-word co-articulation modeling; decision-tree-based parameter sharing; maximum mutual information estimation; n-gram training; pitch features; simplified Mandarin phone set; speaker adaptive training; tone; Automatic speech recognition; Hidden Markov models; Maximum likelihood decoding; Maximum likelihood estimation; Maximum likelihood linear regression; Mel frequency cepstral coefficient; Natural languages; Speech recognition; Telephony; Testing;
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
DOI :
10.1109/CHINSL.2004.1409571