• DocumentCode
    2503936
  • Title

    A Bag-of-Pages Approach to Unordered Multi-page Document Classification

  • Author

    Gordo, Albert ; Perronnin, Florent

  • Author_Institution
    Comput. Vision Center, Univ. Autonoma de Barcelona, Barcelona, Spain
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    1920
  • Lastpage
    1923
  • Abstract
    We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system.
  • Keywords
    document image processing; image classification; bag-of-pages document representation; codebook; discriminative classifier; histogram representation; unordered multi-page document classification; Accuracy; Feature extraction; Hidden Markov models; Histograms; Kernel; Training; Visualization; document classification; fisher kernel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.473
  • Filename
    5597249