DocumentCode :
2143697
Title :
Sample-Dependent Feature Selection for Faster Document Image Categorization
Author :
Louradour, Jérôme ; Kermorvant, Christopher
Author_Institution :
A2iA, Paris, France
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
309
Lastpage :
313
Abstract :
In document image classification, some classes of documents can be easily identified using pixel-level features, whereas some distinctions can only be made using semantics, which usually involves a full automatic text transcription. To be as much efficient as possible, the classification system should be able to avoid extracting high-level and time consuming features when they are not necessary to classify with confidence. We introduce here this issue of sample-dependent feature selection, which has not been addressed before as far as we know. We propose a method to tackle this problem, that can be generalized to any classifier that provides a confidence score along with its prediction. Empirical results using AdaBoost on three mail classification problems show that our approach allows to significantly improve classification efficiency (up to 40% CPU time off) without significant loss of accuracy in comparison to the baseline.
Keywords :
document image processing; feature extraction; image classification; learning (artificial intelligence); text analysis; AdaBoost; automatic text transcription; document image categorization; document image classification; pixel-level features; sample-dependent feature selection; Accuracy; Calibration; Databases; Error analysis; Estimation; Feature extraction; Machine learning; Image document classification; confidence-rated multi-label classification; feature selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.70
Filename :
6065325
Link To Document :
بازگشت