• DocumentCode
    2018241
  • Title

    Data-driven lexicon refinement using local and web resources for Chinese speech recognition

  • Author

    Zhang, Hua ; Zhu, Xuan ; Su, Teng-Rong ; Eom, Ki-Wan ; Lee, Jae-Won

  • Author_Institution
    China Samsung Telecom R&D Center, Samsung Electron., Beijing, China
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    233
  • Lastpage
    237
  • Abstract
    This paper proposes a data-driven lexicon refinement method. By expanding and polishing lexicon using local and web resources, accuracy of Chinese automatic speech recognition (ASR) system is boosted effectively. The proposed lexicon refining process is composed of two steps. First, an improved intra-word measure is introduced. It helps to expand lexicon from local text corpora. Second, the expanded lexicon is polished by enumerating the popularity of appended words based on web query results via search engine. The evaluation experiments are carried out on an application of voice-enabled tourist information query system. Experimental results show that the proposed lexicon refinement method reduces character error rate (CER) by 7.9% relatively.
  • Keywords
    Internet; search engines; speech recognition; ASR; CER; Chinese speech recognition; automatic speech recognition; character error rate; data driven lexicon refinement; expanding lexicon; intra word measurement; lexicon refining process; local resources; polishing lexicon; search engine; voice enabled tourist information query system; web query; web resources; bi-gram measure; lexicon refinement; speech recognition; web resources;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684905
  • Filename
    5684905