DocumentCode :
1797188
Title :
A framework of preparing corpora from social network sites for sentiment analysis
Author :
Medhat, Walaa ; Yousef, Ahmed H. ; Korashy, Hoda
Author_Institution :
Sch. of Electron. Eng., Canadian Int. Coll., Cairo, Egypt
fYear :
2014
fDate :
10-12 Nov. 2014
Firstpage :
32
Lastpage :
39
Abstract :
This paper proposes a framework for preparing and using corpora from online social networks and review sites for sentiment analysis task. The framework consists of three phases. The first phase is the preprocessing and cleaning of data collected, then data annotation. The second phase is applying various text processing techniques including: removing stopwords, replacing the negation words and the following negated words with the antonyms of the negated words, and using selective words of part-of-speech tags (adjectives and verbs) on the prepared corpora. The third phase is text classification using Naïve Bayes and Decision Tree classifiers and two feature selection approaches, unigrams and bigrams. The experiments show that the data is extremely unbalanced. The results show that applying text processing techniques improve the classification accuracy of the Naïve Bayes classifier and reduce the training time of both classifiers. The results also show that Decision tree classifier is more suitable for imbalance data.
Keywords :
Bayes methods; decision trees; feature selection; pattern classification; social networking (online); text analysis; Naive Bayes classifiers; bigrams; corpora; data annotation; decision tree classifiers; feature selection approaches; sentiment analysis; social network sites; text classification; text processing techniques; unigrams; Accuracy; Facebook; Motion pictures; Niobium; Text processing; Training; Twitter; Feature Selection; Preparing Corpora; Sentiment Analysis; Social Network;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Society (i-Society), 2014 International Conference on
Conference_Location :
London
Type :
conf
DOI :
10.1109/i-Society.2014.7009006
Filename :
7009006
Link To Document :
بازگشت