Title :
Classification of children stories in hindi using keywords and POS density
Author :
D M Harikrishna;K. Sreenivasa Rao
Author_Institution :
Indian Institute of Technology Kharagpur, India
Abstract :
The main objective of this work is to classify Hindi stories into three genres: fable, folk-tale and legend. In this paper, we are proposing a framework for story classification using keyword and Part-of-speech (POS) based features. Keyword based features like Term Frequency (TF) and Term Frequency Inverse Document Frequency (TFIDF) are used. Effect of POS tags like Noun, Pronoun, Adjective etc., are analyzed for different story genres. Classification performance is analyzed using different combinations of features with three classifiers; Naive Bayes (NB), k-Nearest Neighbour (KNN) and Support Vector Machine (SVM). From the experimental studies, it is observed that combining linguistic and keyword based features do not improve significantly the classifier performance. Among the classifiers, SVM models outperformed the other models.
Keywords :
"Support vector machines","Niobium","Pragmatics","Conferences","Computers","Text categorization","Tagging"
Conference_Titel :
Computer, Communication and Control (IC4), 2015 International Conference on
DOI :
10.1109/IC4.2015.7375666