DocumentCode
1747208
Title
Document filtering boosted by unlabeled data
Author
Park, Seong-Bae ; Zhang, Byoung-Tak
Author_Institution
Artificial Intelligence Lab., Seoul Nat. Univ., South Korea
Volume
1
fYear
2001
fDate
2001
Firstpage
328
Abstract
This paper describes three learning methods for document filtering that use unlabeled data. The proposed methods are based on a committee of the classifiers which are trained on a small set of labeled data and then augmented by a large number of unlabeled data. By taking advantage of unlabeled data, the effective number of labeled data needed is significantly reduced and the filtering accuracy is increased. The use of unlabeled data is important because obtaining labeled data is difficult and time-consuming, while unlabeled data are abundant. For all proposed methods, the experimental results show that the accuracy is improved up to 9.2% with only two-thirds as many labeled data as the method which does not use unlabeled data
Keywords
document handling; information retrieval; learning (artificial intelligence); AdaBoost method; EM-like method; active sampling method; classifiers; document filtering; labeled data; learning methods; unlabeled data; Artificial intelligence; Bagging; Computer science; Data engineering; Filtering; Filters; Humans; Labeling; Machine learning algorithms; Text processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on
Conference_Location
Pusan
Print_ISBN
0-7803-7090-2
Type
conf
DOI
10.1109/ISIE.2001.931808
Filename
931808
Link To Document