Title :
A Bag-of-Pages Approach to Unordered Multi-page Document Classification
Author :
Gordo, Albert ; Perronnin, Florent
Author_Institution :
Comput. Vision Center, Univ. Autonoma de Barcelona, Barcelona, Spain
Abstract :
We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system.
Keywords :
document image processing; image classification; bag-of-pages document representation; codebook; discriminative classifier; histogram representation; unordered multi-page document classification; Accuracy; Feature extraction; Hidden Markov models; Histograms; Kernel; Training; Visualization; document classification; fisher kernel;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.473