• DocumentCode
    3042877
  • Title

    A Novel Scheme for Term Weighting in Text Categorization: Positive Impact Factor

  • Author

    Emmanuel, M. ; Khatri, Saurabh M. ; Babu, D. R. Ramesh

  • Author_Institution
    Dept. of Inf. Technol., Pune Inst. of Comput. Technol., Pune, India
  • fYear
    2013
  • fDate
    13-16 Oct. 2013
  • Firstpage
    2292
  • Lastpage
    2297
  • Abstract
    The exploitation of Data Mining and Knowledge discovery has penetrated to a variety of Machine Learning Systems. A very important area in the field of Machine learning is Text Categorization. Feature selection and Term weighting are two important steps that decide the result of any Text Categorization problem. In this paper we focus our research on effective term weighting and propose a novel Term weighting approach i.e. Positive Impact Factor (PIF). PIF is a supervised variation of traditional term weighting models. The idea behind PIF scheme revolves around the assumption "Positive impact of a feature to a category can be used to calculate its negative impact for other categories." To examine our weighting scheme we used the dataset Classic 3 from Cornell, which has documents in 3 predefined categories. Results of our experiment and comparison with existing methods such as Binary, TF, TF-IDF, TF-RF, TF-CHI2 etc show remarkable improvement in accuracy with a significant reduction of computational cost.
  • Keywords
    data mining; learning (artificial intelligence); text analysis; PIF; data mining; feature selection; knowledge discovery; machine learning system; positive impact factor; term weighting; text categorization; Accuracy; Machine learning algorithms; Radio frequency; Support vector machines; Testing; Text categorization; Training; Positive Impact factor; Text Categorization; Vector Space model; Weighting Scheme;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
  • Conference_Location
    Manchester
  • Type

    conf

  • DOI
    10.1109/SMC.2013.392
  • Filename
    6722145