• DocumentCode
    3722747
  • Title

    A Hybrid Feature Selection Method for Vietnamese Text Classification

  • Author

    Nguyen Tri Hai;Nguyen Hoang Nghia;Tuan Dinh Le;Vu Thanh Nguyen

  • Author_Institution
    Univ. of Inf. Technol., Ho Chi Minh City, Vietnam
  • fYear
    2015
  • Firstpage
    91
  • Lastpage
    96
  • Abstract
    Text classification is a very important task due to the huge amount of electronic documents. One of the main challenges for text classification is the high dimensionality of feature spaces. There have been extensive studies on feature selections for English text classification. However, not many works have been studied on Vietnamese text classification. This paper evaluates the performances of the three widely used feature selection methods [2][6][10]: the Chi-square (CHI), the Information Gain (IG), and the Document Frequency (DF). Based on the evaluation, we propose a hybrid feature selection method, called SIGCHI, which combines the Chi-square and the Information Gain feature selection methods. Our experimental results showed that the proposed method performs significantly better than the other methods. The accuracy of SIGCHI method is up to 15.03% higher than the one of CHI method, up to 18.65% higher than the one of IG method, and up to 27.72% higher than the one of DF method, respectively.
  • Keywords
    "Text categorization","Support vector machines","Training","Electronic mail","Feature extraction","Information technology","Cities and towns"
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on
  • Type

    conf

  • DOI
    10.1109/KSE.2015.25
  • Filename
    7371764