• DocumentCode
    2184075
  • Title

    Categorical term descriptor: a proposed term weighting scheme for feature selection

  • Author

    How, Bong Chih ; Kulathuramaiyer, Narayanan ; Kiong, Wong Ting

  • Author_Institution
    Fac. of Comput. Sci. & Inf. Technol., Universiti Malaysia, Sarawak, Malaysia
  • fYear
    2005
  • fDate
    19-22 Sept. 2005
  • Firstpage
    313
  • Lastpage
    316
  • Abstract
    This paper proposes a term weighting scheme, categorical term descriptor (CTD), for feature selection in automated text categorization. CTD is an adaptation of the term frequency inverse document frequency (TFIDF). We compared the performance of the proposed method against classical methods such as correlation coefficient, chi-square and information gain using the multinomial naive Bayes and the support vector machine (SVKD) classifiers on the Reuters(10) and Reuters (115) variants of Reuters-21578 dataset. Despite its simplicity, CTD has proven to be promising for both local and global feature selection. CTD works best for the Reuter(10) as a stable local FS method.
  • Keywords
    Bayes methods; pattern classification; support vector machines; text analysis; automated text categorization; categorical term descriptor; chi-square method; correlation coefficient; feature selection; multinomial naive Bayes; support vector machine classifier; term frequency inverse document frequency; term weighting scheme; Computer Society;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2415-X
  • Type

    conf

  • DOI
    10.1109/WI.2005.46
  • Filename
    1517863