DocumentCode :
3424776
Title :
Exploiting prosodic and lexical features for tone modeling in a conditional random field framework
Author :
Wei, Hongxiu ; Wang, Xinhao ; Wu, Hao ; Luo, Dingsheng ; Wu, Xihong
Author_Institution :
Speech & Hearing Res. Center, Peking Univ., Beijing
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
4549
Lastpage :
4552
Abstract :
Tonal cues play an important role in distinguishing ambiguous words in Mandarin speech recognition. This paper explores an innovative tone modeling framework using prosodic and lexical features, as well as syllable context information. A discriminative model, namely a Conditional Random Field (CRF), is adopted, which is sufficiently flexible to handle multiple interacting features and long-range dependencies of observations. After the first pass search of a recognition system, the CRF based tone models are employed to rerank N-best hypotheses according to the tonal scores which can represent the correctness of the tone sequence given each candidate hypothesis and the observed speech signal. Experiments results show that the tonal cues help to achieve 7.8% and 8.6% relative reductions of character error rate on two widely used Mandarin speech recognition tasks, Hub-4 test and 863 test.
Keywords :
speech recognition; Mandarin speech recognition; conditional random field framework; lexical features; tone modeling; Context modeling; Data mining; Feature extraction; Lattices; Mel frequency cepstral coefficient; Pattern recognition; Speech recognition; Support vector machine classification; Support vector machines; Testing; CRF; Mandarin speech recognition; reranking; tone modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518668
Filename :
4518668
Link To Document :
بازگشت