• DocumentCode
    1797188
  • Title

    A framework of preparing corpora from social network sites for sentiment analysis

  • Author

    Medhat, Walaa ; Yousef, Ahmed H. ; Korashy, Hoda

  • Author_Institution
    Sch. of Electron. Eng., Canadian Int. Coll., Cairo, Egypt
  • fYear
    2014
  • fDate
    10-12 Nov. 2014
  • Firstpage
    32
  • Lastpage
    39
  • Abstract
    This paper proposes a framework for preparing and using corpora from online social networks and review sites for sentiment analysis task. The framework consists of three phases. The first phase is the preprocessing and cleaning of data collected, then data annotation. The second phase is applying various text processing techniques including: removing stopwords, replacing the negation words and the following negated words with the antonyms of the negated words, and using selective words of part-of-speech tags (adjectives and verbs) on the prepared corpora. The third phase is text classification using Naïve Bayes and Decision Tree classifiers and two feature selection approaches, unigrams and bigrams. The experiments show that the data is extremely unbalanced. The results show that applying text processing techniques improve the classification accuracy of the Naïve Bayes classifier and reduce the training time of both classifiers. The results also show that Decision tree classifier is more suitable for imbalance data.
  • Keywords
    Bayes methods; decision trees; feature selection; pattern classification; social networking (online); text analysis; Naive Bayes classifiers; bigrams; corpora; data annotation; decision tree classifiers; feature selection approaches; sentiment analysis; social network sites; text classification; text processing techniques; unigrams; Accuracy; Facebook; Motion pictures; Niobium; Text processing; Training; Twitter; Feature Selection; Preparing Corpora; Sentiment Analysis; Social Network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Society (i-Society), 2014 International Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/i-Society.2014.7009006
  • Filename
    7009006