DocumentCode
2148401
Title
Document Image Indexing Using Edit Distance Based Hashing
Author
Hassan, Ehtesham ; Chaudhury, Santanu ; Gopal, M.
Author_Institution
Dept. of Electr. Eng., Indian Inst. of Technol., New Delhi, India
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
1200
Lastpage
1204
Abstract
We present a novel word image based document indexing scheme by combination of string matching and hashing. The word image representation is defined by string codes obtained by unsupervised learning over graphical primitives. The indexing framework is defined by distance based hashing function which does the object projection to hash space by preserving their distances. We have used edit distance based string matching for defining the hashing function and for approximate nearest neighbor based retrieval. The application of the proposed indexing framework is presented for two document image collections belonging to Devanagari and Bengali script.
Keywords
cryptography; file organisation; image representation; image retrieval; indexing; string matching; unsupervised learning; word processing; Bengali script; Devanagari script; approximate nearest neighbor based retrieval; document image collections; edit distance based hashing; graphical primitives; hash space; hashing function; image representation; object projection; string codes; string matching; unsupervised learning; word image based document indexing scheme; Equations; Image representation; Image segmentation; Indexing; Shape; Text analysis; Distance based hashing; Document image indexing; Edit distance; Shape descriptor;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.242
Filename
6065500
Link To Document