• DocumentCode
    2741775
  • Title

    A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

  • Author

    Li, Yun ; Sheng, Yan ; Luan, Luan ; Chen, Ling

  • Author_Institution
    Sch. of Inf. Eng., Yangzhou Univ., Yangzhou, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    95
  • Lastpage
    99
  • Abstract
    Text classification refers to determine the class of an unknown text according to its content in the given classification system. In order to extract fewer features to express the information in the text as much as possible, the paper analysis the various features´ statistical properties and to extract the global features according to Zipf´s law; and then, based on the statistical analysis of the features´ classified information, the efficient feature is extracted by computing the contribute of a feature; After that, the traditional TF-IDF formula is improved using category frequencies named by TF-IDF-CF for calculating the feature weight; Finally the text classification method is proposed. The experiment results illustrate that feature extraction methods proposed in the paper are effective and the formula TF-IDF-CF for calculating the feature weight has higher classification accuracy.
  • Keywords
    feature extraction; statistical analysis; text analysis; word processing; Zipf law; category analysis; feature extraction; information classification; statistical analysis; text classification method; Costs; Data mining; Feature extraction; Frequency; Fuzzy systems; Information analysis; Information management; Knowledge engineering; Statistical analysis; Text categorization; Category Frequency; Feature Extraction; Feature Weight; Zipf Law;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.304
  • Filename
    5358642