• DocumentCode
    1994294
  • Title

    Fast lexicon-based word recognition in noisy index card images

  • Author

    Lucas, Simon M. ; Patoulas, Gregory ; Downton, Andy C.

  • Author_Institution
    Comput. Sci. Dept., Essex Univ., Colchester, UK
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    462
  • Abstract
    This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.
  • Keywords
    feature extraction; image classification; image denoising; image matching; image recognition; optical character recognition; OCR classifier; character hypothesis graph; flat dynamic programming; lexicon-based word recognition; museum index cards; noisy images; noisy index card images; performance evaluation; priority-queue based algorithm; type-written lexicon words; Algorithm design and analysis; Computer science; Dynamic programming; Image recognition; Image segmentation; Optical character recognition software; Packaging machines; Search methods; Systems engineering and theory; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227708
  • Filename
    1227708