Title :
A topic-specific data filtering framework based on rough set theory
Author :
Hong Guo ; Cao, Yunda ; Guo, Song
Author_Institution :
Beijing Inst. of Technol., China
Abstract :
With the tremendous growth in the volume of text documents available on the Internet and digital libraries, accurate specific topic text filtering is needed. In this paper we propose a rough set aided method to reduce the dimensionality of feature vectors. In order to extract accurate features, we also provide a novel filtering technique called twice-filtering to treat with two different feature sets: "interkeywords" and "intrakeyword". A simple application of E-mail filtering system based on our topic-specific filtering technology shows that with the incorporation of variant weighting methods and more accurate features extracted, our filtering algorithm can speed up the filtering operation with a high precision and recall.
Keywords :
Internet; electronic mail; feature extraction; information filters; rough set theory; text analysis; DP; E-mail filtering system; Internet; TF-IDF; digital libraries; document filtering; feature vectors dimensionality; interkeywords; intrakeyword; rough set theory; topic-specific data filtering framework; twice-filtering; variant weighting methods; Digital filters; Electronic mail; Feature extraction; Filtering algorithms; Filtering theory; Information filtering; Information filters; Internet; Set theory; Software libraries;
Conference_Titel :
Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on
Print_ISBN :
0-7803-7781-8
DOI :
10.1109/CCECE.2003.1226087