DocumentCode :
3487460
Title :
A Stream-Based Semi-supervised Active Learning Approach for Document Classification
Author :
Bouguelia, Mohamed-Rafik ; Belaid, Yolande ; Belaid, Abdel
Author_Institution :
Univ. de Lorraine - LORIA, Vandoeuvre-les-Nancy, France
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
611
Lastpage :
615
Abstract :
We consider an industrial context where we deal with a stream of unlabelled documents that become available progressively over time. Based on an adaptive incremental neural gas algorithm (AING), we propose a new stream-based semi supervised active learning method (A2ING) for document classification, which is able to actively query (from a human annotator) the class-labels of documents that are most informative for learning, according to an uncertainty measure. The method maintains a model as a dynamically evolving graph topology of labelled document-representatives that we call neurons. Experiments on different real datasets show that the proposed method requires on average only 36.3% of the incoming documents to be labelled, in order to learn a model which achieves an average gain of 2.15-3.22% in precision, compared to the traditional supervised learning with fully labelled training documents.
Keywords :
graph theory; learning (artificial intelligence); pattern classification; query processing; text analysis; A2ING; AING algorithm; adaptive incremental neural gas algorithm; document class-label querying; document classification; dynamically evolving graph topology; labelled document-representatives; stream-based semisupervised active learning method; uncertainty measure; unlabelled documents; Labeling; Measurement uncertainty; Neurons; Testing; Topology; Training; Uncertainty; Active learning; Document classification; Incremental learning; data stream; semi-supervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.126
Filename :
6628691
Link To Document :
بازگشت