• DocumentCode
    3190084
  • Title

    Document classification using layout analysis

  • Author

    Hu, Jianying ; Kashi, Ramanujan ; Wilfong, Gordon

  • Author_Institution
    Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    556
  • Lastpage
    560
  • Abstract
    This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendible
  • Keywords
    document image processing; encoding; hidden Markov models; image classification; document classification; document image classification; hidden Markov model; interval coding; interval encoding; layout analysis; region layout information; spatial layout level; Content based retrieval; Data mining; Electrical capacitance tomography; Feature extraction; Image classification; Image segmentation; Optical character recognition software; Read only memory; Routing; Shape measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 1999. Proceedings. Tenth International Workshop on
  • Conference_Location
    Florence
  • Print_ISBN
    0-7695-0281-4
  • Type

    conf

  • DOI
    10.1109/DEXA.1999.795245
  • Filename
    795245