• DocumentCode
    2148401
  • Title

    Document Image Indexing Using Edit Distance Based Hashing

  • Author

    Hassan, Ehtesham ; Chaudhury, Santanu ; Gopal, M.

  • Author_Institution
    Dept. of Electr. Eng., Indian Inst. of Technol., New Delhi, India
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    1200
  • Lastpage
    1204
  • Abstract
    We present a novel word image based document indexing scheme by combination of string matching and hashing. The word image representation is defined by string codes obtained by unsupervised learning over graphical primitives. The indexing framework is defined by distance based hashing function which does the object projection to hash space by preserving their distances. We have used edit distance based string matching for defining the hashing function and for approximate nearest neighbor based retrieval. The application of the proposed indexing framework is presented for two document image collections belonging to Devanagari and Bengali script.
  • Keywords
    cryptography; file organisation; image representation; image retrieval; indexing; string matching; unsupervised learning; word processing; Bengali script; Devanagari script; approximate nearest neighbor based retrieval; document image collections; edit distance based hashing; graphical primitives; hash space; hashing function; image representation; object projection; string codes; string matching; unsupervised learning; word image based document indexing scheme; Equations; Image representation; Image segmentation; Indexing; Shape; Text analysis; Distance based hashing; Document image indexing; Edit distance; Shape descriptor;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.242
  • Filename
    6065500