• DocumentCode
    1637543
  • Title

    Document Image Retrieval with Local Feature Sequences

  • Author

    Li, Jilin ; Fan, Zhi-Gang ; Wu, Yadong ; Le, Ning

  • fYear
    2009
  • Firstpage
    346
  • Lastpage
    350
  • Abstract
    In recent years, many document image retrieval algorithms have been proposed. However, most of the current approaches either need good quality images or depend on the page layout structure. This paper presents a fast, accurate and OCR-free image retrieval algorithm using local feature sequences which can describe the intrinsic, unique and page-layout-free characteristics of document images. With a simple preprocessing step, the local feature sequences can be extracted without print-core detection and image registration. Then an efficient coarse-to-fine common substring matching strategy is applied to do local feature sequences matching. Beyond a single matching score, this approach can locate the matched parts word by word. It well handles the challenges including low resolution, different language, rotation and incompleteness and N-up. The encouraging experiment results on a large scale document image database show the retrieval outputs are sufficient good to be used directly as document image identification results.
  • Keywords
    document image processing; feature extraction; image matching; image resolution; image retrieval; image sequences; string matching; OCR-free image retrieval algorithm; document image identification; document image retrieval; image quality; image registration; image resolution; image rotation; large scale document image database; local feature sequence extraction; page layout structure; print-core detection; substring matching strategy; Algorithm design and analysis; Image analysis; Image databases; Image recognition; Image resolution; Image retrieval; Image sequence analysis; Large-scale systems; Shape; Text analysis; Common Substring; Document Image Retrieval; Local Feature Sequences; Suffix Tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.46
  • Filename
    5277676