• DocumentCode
    695479
  • Title

    An extension of topic models for text classification: A term weighting approach

  • Author

    Seonggyu Lee ; Jinho Kim ; Sung-Hyon Myaeng

  • Author_Institution
    Div. of Web Sci. & Technol., KAIST, Daejeon, South Korea
  • fYear
    2015
  • fDate
    9-11 Feb. 2015
  • Firstpage
    217
  • Lastpage
    224
  • Abstract
    Text classification has become a critical step in big data analytics. For supervised machine learning approaches to text classification, availability of sufficient training data with classification labels attached to individual text units is essential to the performance. Since labeled data are usually scarce, however, it is always desirable to devise a semi-supervised method where unlabeled data are used in addition to labeled ones. A solution is to apply a latent factor model to generate clustered text features and use them for text classification. The main thrust of the current research is to extend Latent Dirichlet Allocation (LDA) for this purpose by considering word weights in sampling and maintaining balances of topic distributions. A series of experiments were conducted to evaluate the proposed method for classification tasks. The result shows that the topic distributions generated by the balance weighted topic modeling method add some discriminative power to feature generations for classification.
  • Keywords
    Big Data; data analysis; learning (artificial intelligence); natural language processing; pattern classification; pattern clustering; text analysis; Big Data analytics; LDA; balance weighted topic modeling method; classification labels; clustered text feature generation; discriminative power; individual text units; labeled data; latent Dirichlet allocation; latent factor model; supervised machine learning approach; term weighting approach; text classification; topic distribution; training data; word weights; Data models; Feature extraction; Resource management; Text categorization; Training; Training data; Vocabulary; Latent Dirichlet Allocation; Topic modeling; feature generation; text classification; text clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data and Smart Computing (BigComp), 2015 International Conference on
  • Conference_Location
    Jeju
  • Type

    conf

  • DOI
    10.1109/35021BIGCOMP.2015.7072834
  • Filename
    7072834