• DocumentCode
    672864
  • Title

    Traditional Chinese parser and language modeling for Mandadin ASR

  • Author

    Ang-Hsing Lin ; Yih-Ru Wang ; Sin-Horng Chen

  • Author_Institution
    Inst. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2013
  • fDate
    25-27 Nov. 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    A new approach of traditional Chinese parser to improving the language modeling of Mandarin speech recognition is proposed in this paper. The parser first uses a preprocessing to correct some word segmentation inconsistencies of the text corpus. It then employs a CRF-based word segmentation method and a CRF-based POS tagger to resegment the texts so as to generate better word strings for training an n-gram language model (LM) for ASR. Experimental results on the TCC-300 corpus showed that a word error rate (WER) of 13.4% was achieved by the proposed method. It is about 45% improvement on the relative WER reduction as compared with the previous system.
  • Keywords
    natural language processing; speech recognition; text analysis; Chinese parser; LM; Mandarin ASR; Mandarin speech recognition; WER; language model; text corpus; word error rate; word segmentation; Compounds; Decoding; Error analysis; Speech; Speech recognition; Tagging; Training; Chinese word segmentation; Conditional random field; Language model; automatic speech recognition; weighted finite state transducer;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
  • Conference_Location
    Gurgaon
  • Type

    conf

  • DOI
    10.1109/ICSDA.2013.6709889
  • Filename
    6709889