DocumentCode :
3764851
Title :
Sentiment analysis on (Bengali horoscope) corpus
Author :
Tirthankar Ghosal;Sajal K. Das;Saprativa Bhattacharjee
Author_Institution :
Sikkim Manipal Institute of Technology, Majitar, Rangpo, East Sikkim, India - 737136
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
Sentiment analysis in its simplest form is the classification of a piece of text into positive or negative class based on the polarity of the text. Horoscopes consist of future predictions for each of the twelve zodiac signs and are very popular in India. All major TV channels and newspapers publish their horoscope expert´s predictions on a daily basis. These daily horoscopes are well suited for the task of sentiment analysis as they have a high percentage of strong sentiment bearing sentences. This work deals with sentiment analysis of Bengali daily horoscope. A corpus of 6000 sentences is created by crawling through the website of a leading Bengali newspaper´s daily horoscope section. Each sentence is annotated with polarity (positive or negative) by a team of three independent annotators. A lexicon of 58 stop words is also created from the frequently occurring words in the corpus. A comparative analysis of five well known classification algorithms namely Naïve Bayes, Support Vector Machines, k-Nearest Neighbours, Decision Tree and Random Forest is done. For each classification algorithm three different input features (unigram, bigram and trigram presence) are experimented with. Stop word removal and feature selection using information gain metric are also used. SVM with all unigram features neither removing stop words nor using information gain metric for feature selection proves to be the best combination producing an accuracy of 98.7%.
Keywords :
"Sentiment analysis","Support vector machines","Algorithm design and analysis","Measurement","Blogs","Entropy","Machine learning algorithms"
Publisher :
ieee
Conference_Titel :
India Conference (INDICON), 2015 Annual IEEE
Electronic_ISBN :
2325-9418
Type :
conf
DOI :
10.1109/INDICON.2015.7443551
Filename :
7443551
Link To Document :
بازگشت