• DocumentCode
    1747208
  • Title

    Document filtering boosted by unlabeled data

  • Author

    Park, Seong-Bae ; Zhang, Byoung-Tak

  • Author_Institution
    Artificial Intelligence Lab., Seoul Nat. Univ., South Korea
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    328
  • Abstract
    This paper describes three learning methods for document filtering that use unlabeled data. The proposed methods are based on a committee of the classifiers which are trained on a small set of labeled data and then augmented by a large number of unlabeled data. By taking advantage of unlabeled data, the effective number of labeled data needed is significantly reduced and the filtering accuracy is increased. The use of unlabeled data is important because obtaining labeled data is difficult and time-consuming, while unlabeled data are abundant. For all proposed methods, the experimental results show that the accuracy is improved up to 9.2% with only two-thirds as many labeled data as the method which does not use unlabeled data
  • Keywords
    document handling; information retrieval; learning (artificial intelligence); AdaBoost method; EM-like method; active sampling method; classifiers; document filtering; labeled data; learning methods; unlabeled data; Artificial intelligence; Bagging; Computer science; Data engineering; Filtering; Filters; Humans; Labeling; Machine learning algorithms; Text processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on
  • Conference_Location
    Pusan
  • Print_ISBN
    0-7803-7090-2
  • Type

    conf

  • DOI
    10.1109/ISIE.2001.931808
  • Filename
    931808