• DocumentCode
    2767858
  • Title

    A Text Categorization Method Based on Local Document Frequency

  • Author

    Xia, Feng ; Jicun, Tian ; Zhihui, Liu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Civil Aviation Univ. of China, Tianjin, China
  • Volume
    7
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    468
  • Lastpage
    471
  • Abstract
    In this paper, a fast and effective text categorization method named TCBLDF is proposed. TCBLDF barely needs dimensionality reduction except a stop words removal and a document frequency based feature selection. It tries to capture the relationship between a term and a category label, thus eliminates the need to know the semantic contribution of a term makes to a document it occurs in. TCBLDF use a measure to evaluate the importance of each term for the categorization task, and then gives different weights to them according to the importance evaluations. By doing so, we can make important terms affect more when making classification decision. At last we compare the method to two conventional classification methods, a Naive Bayesian learning and a linear SVM learning method. Experimental results show that TCBLDF is faster than SVM with a comparable performance and more effective than Naive Bayes, thus can be a good alternative to these methods.
  • Keywords
    Bayes methods; classification; feature extraction; support vector machines; category label; classification decision; dimensionality reduction; feature selection; importance evaluations; linear SVM learning method; local document frequency; naive Bayesian learning; text categorization method; Bayesian methods; Classification tree analysis; Computer science; Frequency; Fuzzy systems; Learning systems; Machine learning; Support vector machine classification; Support vector machines; Text categorization; local document frequency; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.291
  • Filename
    5360054