DocumentCode
397065
Title
A topic-specific data filtering framework based on rough set theory
Author
Hong Guo ; Cao, Yunda ; Guo, Song
Author_Institution
Beijing Inst. of Technol., China
Volume
2
fYear
2003
fDate
4-7 May 2003
Firstpage
1095
Abstract
With the tremendous growth in the volume of text documents available on the Internet and digital libraries, accurate specific topic text filtering is needed. In this paper we propose a rough set aided method to reduce the dimensionality of feature vectors. In order to extract accurate features, we also provide a novel filtering technique called twice-filtering to treat with two different feature sets: "interkeywords" and "intrakeyword". A simple application of E-mail filtering system based on our topic-specific filtering technology shows that with the incorporation of variant weighting methods and more accurate features extracted, our filtering algorithm can speed up the filtering operation with a high precision and recall.
Keywords
Internet; electronic mail; feature extraction; information filters; rough set theory; text analysis; DP; E-mail filtering system; Internet; TF-IDF; digital libraries; document filtering; feature vectors dimensionality; interkeywords; intrakeyword; rough set theory; topic-specific data filtering framework; twice-filtering; variant weighting methods; Digital filters; Electronic mail; Feature extraction; Filtering algorithms; Filtering theory; Information filtering; Information filters; Internet; Set theory; Software libraries;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on
ISSN
0840-7789
Print_ISBN
0-7803-7781-8
Type
conf
DOI
10.1109/CCECE.2003.1226087
Filename
1226087
Link To Document