مرکز منطقه ای اطلاع رساني علوم و فناوري - Document filtering boosted by unlabeled data

DocumentCode :

1747208

Title :

Document filtering boosted by unlabeled data

Author :

Park, Seong-Bae ; Zhang, Byoung-Tak

Author_Institution :

Artificial Intelligence Lab., Seoul Nat. Univ., South Korea

Volume :

fYear :

2001

fDate :

2001

Firstpage :

328

Abstract :

This paper describes three learning methods for document filtering that use unlabeled data. The proposed methods are based on a committee of the classifiers which are trained on a small set of labeled data and then augmented by a large number of unlabeled data. By taking advantage of unlabeled data, the effective number of labeled data needed is significantly reduced and the filtering accuracy is increased. The use of unlabeled data is important because obtaining labeled data is difficult and time-consuming, while unlabeled data are abundant. For all proposed methods, the experimental results show that the accuracy is improved up to 9.2% with only two-thirds as many labeled data as the method which does not use unlabeled data

Keywords :

document handling; information retrieval; learning (artificial intelligence); AdaBoost method; EM-like method; active sampling method; classifiers; document filtering; labeled data; learning methods; unlabeled data; Artificial intelligence; Bagging; Computer science; Data engineering; Filtering; Filters; Humans; Labeling; Machine learning algorithms; Text processing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on

Conference_Location :

Pusan

Print_ISBN :

0-7803-7090-2

Type :

conf

DOI :

10.1109/ISIE.2001.931808

Filename :

931808

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1747208