Title of article :
Web-based text classification in the absence of manually labeled training documents
Author/Authors :
Chen-Ming Hung، نويسنده , , Lee-Feng Chien، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2007
Pages :
9
From page :
88
To page :
96
Abstract :
Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self-learned approach to extract high-quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user-defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters-21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.
Journal title :
Journal of the American Society for Information Science and Technology
Serial Year :
2007
Journal title :
Journal of the American Society for Information Science and Technology
Record number :
993423
Link To Document :
بازگشت