مرکز منطقه ای اطلاع رساني علوم و فناوري - Sample-Dependent Feature Selection for Faster Document Image Categorization

DocumentCode :

2143697

Title :

Sample-Dependent Feature Selection for Faster Document Image Categorization

Author :

Louradour, Jérôme ; Kermorvant, Christopher

Author_Institution :

A2iA, Paris, France

fYear :

2011

fDate :

18-21 Sept. 2011

Firstpage :

309

Lastpage :

313

Abstract :

In document image classification, some classes of documents can be easily identified using pixel-level features, whereas some distinctions can only be made using semantics, which usually involves a full automatic text transcription. To be as much efficient as possible, the classification system should be able to avoid extracting high-level and time consuming features when they are not necessary to classify with confidence. We introduce here this issue of sample-dependent feature selection, which has not been addressed before as far as we know. We propose a method to tackle this problem, that can be generalized to any classifier that provides a confidence score along with its prediction. Empirical results using AdaBoost on three mail classification problems show that our approach allows to significantly improve classification efficiency (up to 40% CPU time off) without significant loss of accuracy in comparison to the baseline.

Keywords :

document image processing; feature extraction; image classification; learning (artificial intelligence); text analysis; AdaBoost; automatic text transcription; document image categorization; document image classification; pixel-level features; sample-dependent feature selection; Accuracy; Calibration; Databases; Error analysis; Estimation; Feature extraction; Machine learning; Image document classification; confidence-rated multi-label classification; feature selection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition (ICDAR), 2011 International Conference on

Conference_Location :

Beijing

ISSN :

1520-5363

Print_ISBN :

978-1-4577-1350-7

Electronic_ISBN :

1520-5363

Type :

conf

DOI :

10.1109/ICDAR.2011.70

Filename :

6065325

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2143697