• DocumentCode
    1636001
  • Title

    A Rotation Invariant Page Layout Descriptor for Document Classification and Retrieval

  • Author

    Gordo, Albert ; Valveny, Ernest

  • Author_Institution
    Comput. Vision Center, Univ. Autonoma de Barcelona, Barcelona, Spain
  • fYear
    2009
  • Firstpage
    481
  • Lastpage
    485
  • Abstract
    Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
  • Keywords
    classification; computational complexity; document handling; information retrieval; Girona archive database; computational complexity; cyclic dynamic time warping; document classification; document retrieval; minimum weight edge cover; rotation invariant page layout descriptor; Computer vision; Databases; Earth; Feature extraction; Image segmentation; Optical character recognition software; Pixel; Text analysis; Time measurement; Tree graphs; Document classification; cyclic dynamic time warping; retrieval; rotation invariant;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.110
  • Filename
    5277619