DocumentCode :
2503936
Title :
A Bag-of-Pages Approach to Unordered Multi-page Document Classification
Author :
Gordo, Albert ; Perronnin, Florent
Author_Institution :
Comput. Vision Center, Univ. Autonoma de Barcelona, Barcelona, Spain
fYear :
2010
fDate :
23-26 Aug. 2010
Firstpage :
1920
Lastpage :
1923
Abstract :
We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system.
Keywords :
document image processing; image classification; bag-of-pages document representation; codebook; discriminative classifier; histogram representation; unordered multi-page document classification; Accuracy; Feature extraction; Hidden Markov models; Histograms; Kernel; Training; Visualization; document classification; fisher kernel;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
ISSN :
1051-4651
Print_ISBN :
978-1-4244-7542-1
Type :
conf
DOI :
10.1109/ICPR.2010.473
Filename :
5597249
Link To Document :
بازگشت