• DocumentCode
    3140980
  • Title

    Document image layout comparison and classification

  • Author

    Hu, Jianying ; Kashi, Ramanujan ; Wilfong, Gordon

  • Author_Institution
    Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
  • fYear
    1999
  • fDate
    20-22 Sep 1999
  • Firstpage
    285
  • Lastpage
    288
  • Abstract
    The paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors which can be used for fast page layout comparison. The paper describes experiments and results to rank-order a set of document pages in terms of their layout similarity to a test document. We also demonstrate the usefulness of the features derived from interval encoding in a hidden Markov model based page layout classification system that is trainable and extendible
  • Keywords
    document image processing; encoding; hidden Markov models; image classification; information retrieval; HMM; document image classification; document image layout comparison; document pages; fast algorithms; fast page layout comparison; fixed-length vectors; hidden Markov model based page layout classification system; initial document type classification; interval encoding; layout similarity; region layout information; spatial layout; spatial layout level; test document; visual similarity based document retrieval; Data mining; Electronic switching systems; Image databases; Image retrieval; Information retrieval; Optical character recognition software; Shape measurement; Spatial databases; Spatial resolution; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-7695-0318-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1999.791780
  • Filename
    791780