DocumentCode :
2424332
Title :
A rough set-based hybrid method to text categorization
Author :
Bao, Yongguang ; Aoyama, Satoshi ; Du, Xiaoyong ; Yamada, Kazutaka ; Ishii, Naoho
Author_Institution :
Dept. of Intelligence & Comput. Sci., Nagoya Inst. of Technol., Japan
Volume :
1
fYear :
2001
fDate :
3-6 Dec. 2001
Firstpage :
254
Abstract :
In this paper we present a hybrid text categorization method based on Rough Sets theory. A central problem in good text classification for information filtering and retrieval (IF/IR) is the high dimensionality of the data. It may contain many unnecessary and irrelevant features. To cope with this problem, we propose a hybrid technique using Latent Semantic Indexing (LSI) and Rough Sets theory (RS) to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. Besides, we generate several knowledge base instead of one knowledge base for the classification of new object, hoping that the combination of answers of the multiple knowledge bases result in better performance. Multiple knowledge bases can be formulated precisely and in a unified way within the framework of RS. This paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm, Latent Semantic indexing (LSI) with Rough Set-based rule generate algorithm, and provides experimental results. The test results show the hybrid method is better than the previous rough set-based approach.
Keywords :
classification; information retrieval; rough set theory; Latent Semantic Indexing; hybrid text categorization; information filtering; information retrieval; rough sets theory; text classification; text documents; Computer science; Indexing; Information filtering; Information retrieval; Large scale integration; Machine learning; Machine learning algorithms; Rough sets; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems Engineering, 2001. Proceedings of the Second International Conference on
Print_ISBN :
0-7695-1393-X
Type :
conf
DOI :
10.1109/WISE.2001.996486
Filename :
996486
Link To Document :
بازگشت