• DocumentCode
    3346863
  • Title

    High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

  • Author

    Huo, Qiang ; Ge, Yong ; Feng, Zhi-Dan

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China
  • Volume
    3
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    1517
  • Abstract
    We have developed a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kat, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450 MHz PC yet only consuming less than 2 MB memory. We describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training
  • Keywords
    character sets; document image processing; feature extraction; optical character recognition; Chinese OCR; GABOR features; GB2312-80; GBK Chinese characters; OCR engine; alphanumeric characters; character recognition; discriminative feature extraction; fonts; high performance OCR; machine printed documents; model training; punctuation marks; Automatic speech recognition; Character recognition; Engines; Feature extraction; Helium; Image recognition; Image segmentation; Optical character recognition software; Pattern recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7041-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2001.941220
  • Filename
    941220