• DocumentCode
    1580298
  • Title

    Robust word recognition for museum archive card indexing

  • Author

    Lucas, S.M. ; Tams, A.C. ; Cho, Sung J. ; Ryu, Sungho ; Downton, A.C.

  • Author_Institution
    Dept. of Comput. Sci., Essex Univ., Colchester, UK
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    144
  • Lastpage
    148
  • Abstract
    We describe a novel robust approach to enable efficient searching of the type-written text on museum archive cards. Depending on such factors as the state of the typewriter and its ribbon, these text images may be faint with parts of the character missing, or be in heavy type with adjacent characters merging together. Both these problems can make this kind of text hard to read with conventional OCR methods that rely on the use of a limited number of segmentation hypotheses prior to recognition. Our method involves sliding a classifier over the entire word or card image, such that we get a set of recognition hypotheses for each possible window position which gives rise to a large character hypothesis graph. We then apply a graph reduction followed by an efficient graph search method to search for words in the reduced graph. Results so far are promising, with our system achieving 45% word recognition accuracy compared to the 25% achieved by a leading commercial package. However, searching the original larger graphs is much slower but yields 85% accuracy; so further work is needed either in improving the graph reduction method, or in improving the efficiency with which we can search the larger graph
  • Keywords
    database indexing; document image processing; humanities; optical character recognition; visual databases; OCR; character hypothesis graph; character merging; classifier; graph reduction; graph search; museum archive card indexing; optical character recognition; text searching; type-written text; word recognition; Character recognition; Image recognition; Image segmentation; Indexing; Merging; Optical character recognition software; Packaging; Robustness; Search methods; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953772
  • Filename
    953772