DocumentCode :
2142288
Title :
Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method
Author :
Rusiñol, Marçal ; Aldavert, David ; Toledo, Ricardo ; Lladós, Josep
Author_Institution :
Dept. Cienc. de la Computacio, Univ. Autonoma de Barcelona, Bellaterra, Spain
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
63
Lastpage :
67
Abstract :
In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.
Keywords :
document image processing; feature extraction; handwriting recognition; indexing; word processing; SIFT descriptors; bag of visual word model; feature vectors; handwritten historical document images; heterogeneous document image collections; latent semantic indexing technique; nonLatin scripts; patch based framework; segmentation free word spotting method; typewritten historical document images; Feature extraction; Hidden Markov models; Image segmentation; Indexing; Large scale integration; Semantics; Visualization; Dense SIFT Features; Heterogeneous Document Collections; Latent Semantic Indexing; Word Spotting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.22
Filename :
6065277
Link To Document :
بازگشت