• DocumentCode
    477789
  • Title

    Text Feature Extraction Based on Rough Set

  • Author

    Cheng, Yiyuan ; Zhang, Ruiling ; Wang, Xiufeng ; Chen, Qiushuang

  • Author_Institution
    Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin
  • Volume
    2
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    310
  • Lastpage
    314
  • Abstract
    In this paper, a method for text feature extraction based on rough set (TFERS) is proposed. Firstly, a new formulation for attribute significance is presented based on the classification capability of condition attributes, which avoids the recalculation of attribute significance during iterations of reduction procedure conducted in conventional rough-set-based methods. Secondly, the attribute correlation analysis is incorporated, which helps to achieve a satisfactory reduction of text features. In text preprocessing phase, the typical vector space representation is extended from term to concept (dasiasynsetpsila) level based on Wordnet. In this way, the problem of synonym is solved and the dimension of the feature vector is reduced obviously. The simulation experiment and applications in text classification show that TFERS can improve the classification performance significantly.
  • Keywords
    classification; correlation methods; data reduction; feature extraction; iterative methods; rough set theory; text analysis; Wordnet; attribute correlation analysis; iteration method; reduction procedure; rough set theory; text feature extraction; text feature reduction; text preprocessing phase; vector space representation; Analytical models; Computational modeling; Educational institutions; Feature extraction; Fuzzy systems; Internet; Set theory; Text categorization; Text mining; Web pages; Text feature extraction; attribute significance.; reduction; rough set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
  • Conference_Location
    Shandong
  • Print_ISBN
    978-0-7695-3305-6
  • Type

    conf

  • DOI
    10.1109/FSKD.2008.521
  • Filename
    4666129