• DocumentCode
    773698
  • Title

    Statistical syntactic methods for high-performance OCR

  • Author

    Lucas, S. ; Amiri, A.

  • Author_Institution
    Dept. of Electron. Syst. Eng., Essex Univ., Colchester, UK
  • Volume
    143
  • Issue
    1
  • fYear
    1996
  • fDate
    2/1/1996 12:00:00 AM
  • Firstpage
    23
  • Lastpage
    30
  • Abstract
    The paper describes a new method for language modelling and reports its application to handwritten OCR. Images of characters are first chain-coded to convert them to strings. A novel language modelling method is then applied to build a statistical model for strings of each class. The language modelling method is based on a probabilistic version of an n-tuple classifier which is scanned along the entire string for both training and recognition. This method is extremely fast and robust, and concentrates all the computational effort on the portion of the image where the information is, i.e. the edges left by the trace of the pen. Results on the CEDAR handwritten digit database show the new method to be almost as accurate as the best methods reported so far, while offering a significant speed advantage
  • Keywords
    edge detection; handwriting recognition; image classification; image coding; natural languages; optical character recognition; probability; statistical analysis; CEDAR handwritten digit database; chain-coding; character images; handwritten OCR; high-performance OCR; language modelling; n-tuple classifier; probabilistic version; recognition; speed advantage; statistical model; statistical syntactic methods; strings; training;
  • fLanguage
    English
  • Journal_Title
    Vision, Image and Signal Processing, IEE Proceedings -
  • Publisher
    iet
  • ISSN
    1350-245X
  • Type

    jour

  • DOI
    10.1049/ip-vis:19960253
  • Filename
    487843