Title :
Mood classifiaction of lyrics using SentiWordNet
Author :
Kumar, Vipin ; Minz, Sonajharia
Author_Institution :
Sch. of Comput. & Syst. Sci., Jawaharlal Nehru Univ., New Delhi, India
Abstract :
The text data being unstructured pose multiple research issues in document classification. Relevant feature extraction is the foremost problem in the preprocessing stage. SentiWordNet is an ontology that includes numeric scores related to the positive or negative aspects of the words. The work in this paper explores the use of SentiWordNet to extract sentiment features of the words in the song lyrics. The experiments are carried out on a collection of 185 lyrics each belonging to one of the four classes. Three classification algorithms namely, Naïve Bayesian (NB), k-Nearest Neighbor (KNN) and Support Vector Machine (SVM) using six measures for attribute relevance analysis namely, Principal Component Analysis (PCA), Latent Semantic Analysis (LSA), Chi-Square (CS), Information Gain (IG), GINI Index (GI) and Gain Ratio (GR) have been applied to model the classifiers. The experiments examine the relevance of the sentiment features for classification. The ratio of the positive and negative scores, normalized ratio, and average of the positive and negative scores are three sentiment features. The experimental results indicate that the Naïve Bayesian classifier using the average of the positive and negative score as sentiment feature, and gain ratio as feature selection criteria achieve 78.27% accuracy based on top 10% of the features. The second best accuracy has been achieved by SVM-based classifiers using the average of the positive and negative score as sentiment feature and top 10% features applying all feature selection criteria except CS.
Keywords :
Bayes methods; feature extraction; information retrieval; music; ontologies (artificial intelligence); pattern classification; principal component analysis; support vector machines; text analysis; Chi-Square classifiers; GINI index; KNN classifiers; LSA; Naive Bayesian classifier; SVM-based classifiers; SentiWordNet; attribute relevance analysis; classification algorithms; document classification; feature selection criteria; gain ratio; information gain; k-nearest neighbor classifiers; latent semantic analysis; lyrics mood classification; numeric scores; ontology; principal component analysis; sentiment feature extraction; support vector machine; text data; Accuracy; Classification algorithms; Computers; Feature extraction; Mood; Niobium; Support vector machines; Feature Reduction; Lyrics; Mood Classification; SentiWordNet; Sentiment Feature;
Conference_Titel :
Computer Communication and Informatics (ICCCI), 2013 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4673-2906-4
DOI :
10.1109/ICCCI.2013.6466307