• DocumentCode
    2691385
  • Title

    A Chinese OCR spelling check approach based on statistical language models

  • Author

    Zhuang, Li ; Bao, Ta ; Zhu, Xiaoyan ; Wang, Chunheng ; Naoi, Satoshi

  • Author_Institution
    DCST, Tsinghua Univ., Beijing, China
  • Volume
    5
  • fYear
    2004
  • fDate
    10-13 Oct. 2004
  • Firstpage
    4727
  • Abstract
    This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.
  • Keywords
    maximum likelihood estimation; natural languages; optical character recognition; spelling aids; Chinese optical character recognition; Viterbi search process; latent semantic analysis language; spelling check; statistical language models; Character recognition; Computer errors; Engines; Image recognition; Information analysis; Natural languages; Optical character recognition software; Optical computing; Probability; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2004 IEEE International Conference on
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-8566-7
  • Type

    conf

  • DOI
    10.1109/ICSMC.2004.1401278
  • Filename
    1401278