• DocumentCode
    453840
  • Title

    Learning Topic-Based Mixture Models for Factored Classification

  • Author

    Chen, Qiong ; Mitchell, Tom M.

  • Author_Institution
    Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
  • Volume
    1
  • fYear
    2005
  • fDate
    28-30 Nov. 2005
  • Firstpage
    25
  • Lastpage
    31
  • Abstract
    We present a learning algorithm for factored classification, employing a topic-based mixture model. In factored classification, the class label is factored into a vector of class features. For example, the class label for a personal Web page at a university might be described by two features: the academic discipline of the person, and their position (e.g., `chemistry professor´ or `physics student´). We present an approach to factored classification of text documents in which each document is assumed to be generated by a mixture of class features. This formulation allows building on recent work on topic-based mixture models for unsupervised text analysis. We present an algorithm for supervised learning of mixture models for factored classification. Experiments in two factored text classification problems (classifying Web pages and classifying the intent of email senders) demonstrate our approach, and show it can outperform earlier approaches for categories with especially sparse training data
  • Keywords
    classification; text analysis; unsupervised learning; factored classification; sparse training data; supervised learning; text classification; text document; topic-based mixture model; unsupervised text analysis; Chemistry; Classification algorithms; Computer science; Physics; Supervised learning; Text analysis; Text categorization; Training data; Web pages; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on
  • Conference_Location
    Vienna
  • Print_ISBN
    0-7695-2504-0
  • Type

    conf

  • DOI
    10.1109/CIMCA.2005.1631237
  • Filename
    1631237