DocumentCode :
672864
Title :
Traditional Chinese parser and language modeling for Mandadin ASR
Author :
Ang-Hsing Lin ; Yih-Ru Wang ; Sin-Horng Chen
Author_Institution :
Inst. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
fYear :
2013
fDate :
25-27 Nov. 2013
Firstpage :
1
Lastpage :
5
Abstract :
A new approach of traditional Chinese parser to improving the language modeling of Mandarin speech recognition is proposed in this paper. The parser first uses a preprocessing to correct some word segmentation inconsistencies of the text corpus. It then employs a CRF-based word segmentation method and a CRF-based POS tagger to resegment the texts so as to generate better word strings for training an n-gram language model (LM) for ASR. Experimental results on the TCC-300 corpus showed that a word error rate (WER) of 13.4% was achieved by the proposed method. It is about 45% improvement on the relative WER reduction as compared with the previous system.
Keywords :
natural language processing; speech recognition; text analysis; Chinese parser; LM; Mandarin ASR; Mandarin speech recognition; WER; language model; text corpus; word error rate; word segmentation; Compounds; Decoding; Error analysis; Speech; Speech recognition; Tagging; Training; Chinese word segmentation; Conditional random field; Language model; automatic speech recognition; weighted finite state transducer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
Type :
conf
DOI :
10.1109/ICSDA.2013.6709889
Filename :
6709889
Link To Document :
بازگشت