Title :
Document Image Retrieval with Local Feature Sequences
Author :
Li, Jilin ; Fan, Zhi-Gang ; Wu, Yadong ; Le, Ning
Abstract :
In recent years, many document image retrieval algorithms have been proposed. However, most of the current approaches either need good quality images or depend on the page layout structure. This paper presents a fast, accurate and OCR-free image retrieval algorithm using local feature sequences which can describe the intrinsic, unique and page-layout-free characteristics of document images. With a simple preprocessing step, the local feature sequences can be extracted without print-core detection and image registration. Then an efficient coarse-to-fine common substring matching strategy is applied to do local feature sequences matching. Beyond a single matching score, this approach can locate the matched parts word by word. It well handles the challenges including low resolution, different language, rotation and incompleteness and N-up. The encouraging experiment results on a large scale document image database show the retrieval outputs are sufficient good to be used directly as document image identification results.
Keywords :
document image processing; feature extraction; image matching; image resolution; image retrieval; image sequences; string matching; OCR-free image retrieval algorithm; document image identification; document image retrieval; image quality; image registration; image resolution; image rotation; large scale document image database; local feature sequence extraction; page layout structure; print-core detection; substring matching strategy; Algorithm design and analysis; Image analysis; Image databases; Image recognition; Image resolution; Image retrieval; Image sequence analysis; Large-scale systems; Shape; Text analysis; Common Substring; Document Image Retrieval; Local Feature Sequences; Suffix Tree;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.46