• DocumentCode
    3564964
  • Title

    Sentiment Feature Selection Algorithm for Chinese Micro-blog

  • Author

    Yu Jian Kun ; Zhao Lei

  • Author_Institution
    Sch. of Inf., Yunnan Univ. of Finance & Econ., Kunming, China
  • fYear
    2014
  • Firstpage
    114
  • Lastpage
    118
  • Abstract
    Sentiment analysis is to extract the opinion of the user from of the text documents. Sentiment classification using machine learning methods face problem of handing huge number of unique terms in a feature vector for the classification. Therefore, a feature selection method is required to eliminate the irrelevant and noisy features from the feature vector for efficient working of ML algorithms. Rought set Theory based feature selection method is not good at Chinese text although it did well in English text. In this paper, improved feature selection methods are proposed which are based on rough set theory and adapt to the Chinese micro blog. We name them as IGAR (IG and Rough set) and CHIAR (CHI and Rough set). The performance of the improved feature selection methods are compared with Information Gain (IG) method which has been identified as one of the best feature selection method for sentiment classification. Experimentation of improved feature selection methods was performed on two datasets which are extracted from Sina microblog. Experimental results show that improved feature selection methods outperform other feature selection.
  • Keywords
    Web sites; feature selection; learning (artificial intelligence); pattern classification; rough set theory; text analysis; CHI and rough set; CHIAR; Chinese micro-blog; IG and rough set; IGAR; ML algorithm; Sina microblog; feature vector; information gain; machine learning methods; opinion extraction; rought set theory based feature selection method; sentiment analysis; sentiment classification; sentiment feature selection algorithm; text documents; Accuracy; Algorithm design and analysis; Classification algorithms; Feature extraction; Niobium; Sentiment analysis; Support vector machines; Feature selection; Micro-blog sentiment; Rough set; Sentiment analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Management of e-Commerce and e-Government (ICMeCG), 2014 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICMeCG.2014.32
  • Filename
    7046901