DocumentCode
2503936
Title
A Bag-of-Pages Approach to Unordered Multi-page Document Classification
Author
Gordo, Albert ; Perronnin, Florent
Author_Institution
Comput. Vision Center, Univ. Autonoma de Barcelona, Barcelona, Spain
fYear
2010
fDate
23-26 Aug. 2010
Firstpage
1920
Lastpage
1923
Abstract
We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system.
Keywords
document image processing; image classification; bag-of-pages document representation; codebook; discriminative classifier; histogram representation; unordered multi-page document classification; Accuracy; Feature extraction; Hidden Markov models; Histograms; Kernel; Training; Visualization; document classification; fisher kernel;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location
Istanbul
ISSN
1051-4651
Print_ISBN
978-1-4244-7542-1
Type
conf
DOI
10.1109/ICPR.2010.473
Filename
5597249
Link To Document