DocumentCode
672864
Title
Traditional Chinese parser and language modeling for Mandadin ASR
Author
Ang-Hsing Lin ; Yih-Ru Wang ; Sin-Horng Chen
Author_Institution
Inst. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
fYear
2013
fDate
25-27 Nov. 2013
Firstpage
1
Lastpage
5
Abstract
A new approach of traditional Chinese parser to improving the language modeling of Mandarin speech recognition is proposed in this paper. The parser first uses a preprocessing to correct some word segmentation inconsistencies of the text corpus. It then employs a CRF-based word segmentation method and a CRF-based POS tagger to resegment the texts so as to generate better word strings for training an n-gram language model (LM) for ASR. Experimental results on the TCC-300 corpus showed that a word error rate (WER) of 13.4% was achieved by the proposed method. It is about 45% improvement on the relative WER reduction as compared with the previous system.
Keywords
natural language processing; speech recognition; text analysis; Chinese parser; LM; Mandarin ASR; Mandarin speech recognition; WER; language model; text corpus; word error rate; word segmentation; Compounds; Decoding; Error analysis; Speech; Speech recognition; Tagging; Training; Chinese word segmentation; Conditional random field; Language model; automatic speech recognition; weighted finite state transducer;
fLanguage
English
Publisher
ieee
Conference_Titel
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location
Gurgaon
Type
conf
DOI
10.1109/ICSDA.2013.6709889
Filename
6709889
Link To Document