Title :
Efficient classification using phrases generated by topic models
Author :
Gujraniya, Deepak ; Murty, M. Narasimha
Author_Institution :
Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
Abstract :
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn´t capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Keywords :
pattern classification; text analysis; LDA; Naïve Bayes classifier; bag of phrases model; k-nearest neighbors; latent Dirichlet allocation; phrase extraction; support vector machine; text classification; topic models; vector space model; Accuracy; Computational modeling; Dictionaries; Machine learning; Semantics; Support vector machines; Vectors;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4