Title :
Toward semi-automatic construction of training-corpus for text classification
Author :
Guan, Jihong ; Zhou, Shuigeng
Author_Institution :
Sch. of Comput. Sci., Wuhan Univ., China
Abstract :
Text classification is becoming more and more important with the rapid growth of on-line information available. It was observed that the quality of the training corpus impacts the performance of the trained classifier. This paper proposes an approach to build high-quality training corpuses for better classification performance by first exploring the properties of training corpuses, and then giving an algorithm for constructing training corpuses semi-automatically. Preliminary experimental results validate our approach: classifiers based on the training corpuses constructed by our approach can achieve good performance while the training corpus´ size is significantly compressed. Our approach can be used for building an efficient and lightweight classification system.
Keywords :
classification; information retrieval; natural languages; text analysis; Chinese text; experimental results; natural language; online information; performance; semi-automatic training corpus development; text classification; Algorithm design and analysis; Buildings; Computer science; Information retrieval; Machine learning; Organizing; Pattern recognition; Software engineering; Text categorization;
Conference_Titel :
Systems, Man and Cybernetics, 2002 IEEE International Conference on
Print_ISBN :
0-7803-7437-1
DOI :
10.1109/ICSMC.2002.1173245