• DocumentCode
    1135323
  • Title

    Document Image Retrieval through Word Shape Coding

  • Author

    Lu, Shijian ; Li, Linlin ; Tan, Chew Lim

  • Author_Institution
    Agency for Sci., Technol. & Res., Inst. for Infocomm Res., Singapore
  • Volume
    30
  • Issue
    11
  • fYear
    2008
  • Firstpage
    1913
  • Lastpage
    1918
  • Abstract
    This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
  • Keywords
    document image processing; image coding; image retrieval; annotated word shape codes; character ascenders; character descenders; character holes; character water reservoirs; document content; document degradation; document image retrieval; document image searching; query document image; query keywords; topological shape features; word image annotation; word shape coding; Artificial Intelligence; Computing Methodologies; Document Capture; Document analysism; Document and Text Processing; Image/video retrieval; Shape; Text processing; Vision and Scene Understanding; Artificial Intelligence; Automatic Data Processing; Database Management Systems; Databases, Factual; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Language; Pattern Recognition, Automated; Reading;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2008.89
  • Filename
    4492785