DocumentCode :
595197
Title :
Efficient classification using phrases generated by topic models
Author :
Gujraniya, Deepak ; Murty, M. Narasimha
Author_Institution :
Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
2331
Lastpage :
2334
Abstract :
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn´t capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Keywords :
pattern classification; text analysis; LDA; Naïve Bayes classifier; bag of phrases model; k-nearest neighbors; latent Dirichlet allocation; phrase extraction; support vector machine; text classification; topic models; vector space model; Accuracy; Computational modeling; Dictionaries; Machine learning; Semantics; Support vector machines; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460632
Link To Document :
بازگشت