• DocumentCode
    397065
  • Title

    A topic-specific data filtering framework based on rough set theory

  • Author

    Hong Guo ; Cao, Yunda ; Guo, Song

  • Author_Institution
    Beijing Inst. of Technol., China
  • Volume
    2
  • fYear
    2003
  • fDate
    4-7 May 2003
  • Firstpage
    1095
  • Abstract
    With the tremendous growth in the volume of text documents available on the Internet and digital libraries, accurate specific topic text filtering is needed. In this paper we propose a rough set aided method to reduce the dimensionality of feature vectors. In order to extract accurate features, we also provide a novel filtering technique called twice-filtering to treat with two different feature sets: "interkeywords" and "intrakeyword". A simple application of E-mail filtering system based on our topic-specific filtering technology shows that with the incorporation of variant weighting methods and more accurate features extracted, our filtering algorithm can speed up the filtering operation with a high precision and recall.
  • Keywords
    Internet; electronic mail; feature extraction; information filters; rough set theory; text analysis; DP; E-mail filtering system; Internet; TF-IDF; digital libraries; document filtering; feature vectors dimensionality; interkeywords; intrakeyword; rough set theory; topic-specific data filtering framework; twice-filtering; variant weighting methods; Digital filters; Electronic mail; Feature extraction; Filtering algorithms; Filtering theory; Information filtering; Information filters; Internet; Set theory; Software libraries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on
  • ISSN
    0840-7789
  • Print_ISBN
    0-7803-7781-8
  • Type

    conf

  • DOI
    10.1109/CCECE.2003.1226087
  • Filename
    1226087