• DocumentCode
    935440
  • Title

    Hybrid contextural text recognition with string matching

  • Author

    Sinha, R.M.K. ; Prasada, Birendra ; Houle, Gilles F. ; Sabourin, Michael

  • Author_Institution
    Indian Inst. of Technol., Kanpur, India
  • Volume
    15
  • Issue
    9
  • fYear
    1993
  • fDate
    9/1/1993 12:00:00 AM
  • Firstpage
    915
  • Lastpage
    925
  • Abstract
    The hybrid contextural algorithm for reading real-life documents printed in varying fonts of any size is presented. Text is recognized progressively in three passes. The first pass is used to generate character hypothesis, the second to generate word hypothesis, and the third to verify the word hypothesis. During the first pass, isolated characters are recognized using a dynamic contour warping classifier. Transient statistical information is collected to accelerate the recognition process and to verify hypotheses in later processing. A transient dictionary consisting of high confidence nondictionary words is constructed in this pass. During the second pass, word-level hypotheses are generated using hybrid contextual text processing. Nondictionary words are recognized using a modified Viterbi algorithm, a string matching algorithm utilizing n grams, special handlers for touching characters, and pragmatic handlers for numerals, punctuation, hyphens, apostrophes, and a prefix/suffix handler. This processing usually generates several word hypothesis. During the third pass, word-level verification occurs
  • Keywords
    document image processing; optical character recognition; character hypothesis; dynamic contour warping classifier; hybrid contextural algorithm; hypothesis verification; modified Viterbi algorithm; progressive recognition; real-life documents; string matching; text recognition; transient dictionary; transient statistical information; word hypothesis; Acceleration; Algorithm design and analysis; Character generation; Character recognition; Costs; Dictionaries; Hybrid power systems; Senior members; Text recognition; Viterbi algorithm;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.232077
  • Filename
    232077