• DocumentCode
    1360466
  • Title

    Integrating knowledge sources in Devanagari text recognition system

  • Author

    Bansal, Veena ; Sinha, R.M.K.

  • Author_Institution
    Dept. of Ind. & Manage. Eng., Indian Inst. of Technol., Kanpur, India
  • Volume
    30
  • Issue
    4
  • fYear
    2000
  • fDate
    7/1/2000 12:00:00 AM
  • Firstpage
    500
  • Lastpage
    505
  • Abstract
    The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved
  • Keywords
    document image processing; knowledge based systems; learning (artificial intelligence); optical character recognition; Devanagari text recognition; document image recognition; knowledge based system; knowledge sources; learning process; optical character recognition; printed character recognition; word dictionary; Character recognition; Computer errors; Dictionaries; Error correction; Knowledge based systems; Natural languages; Optical character recognition software; Speech recognition; System testing; Text recognition;
  • fLanguage
    English
  • Journal_Title
    Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4427
  • Type

    jour

  • DOI
    10.1109/3468.852443
  • Filename
    852443