• DocumentCode
    595197
  • Title

    Efficient classification using phrases generated by topic models

  • Author

    Gujraniya, Deepak ; Murty, M. Narasimha

  • Author_Institution
    Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    2331
  • Lastpage
    2334
  • Abstract
    There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn´t capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
  • Keywords
    pattern classification; text analysis; LDA; Naïve Bayes classifier; bag of phrases model; k-nearest neighbors; latent Dirichlet allocation; phrase extraction; support vector machine; text classification; topic models; vector space model; Accuracy; Computational modeling; Dictionaries; Machine learning; Semantics; Support vector machines; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460632