• DocumentCode
    76128
  • Title

    Label Correlation Mixture Model: A Supervised Generative Approach to Multilabel Spoken Document Categorization

  • Author

    Zhiyang He ; Ji Wu ; Tao Li

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • Volume
    3
  • Issue
    2
  • fYear
    2015
  • fDate
    Jun-15
  • Firstpage
    235
  • Lastpage
    245
  • Abstract
    Multilabel categorization, which is more difficult but practical than the conventional binary and multiclass categorization, has received a great deal of attention in recent years. This paper proposes a novel probabilistic generative model, label correlation mixture model (LCMM), to depict the multiply labeled documents, which can be used for multilabel spoken document categorization as well as multilabel text categorization. In LCMM, labels and topics have the one-to-one correspondences. The LCMM consists of two important components: 1) a label correlation model and 2) a multilabel conditioned document model. The label correlation model formulates the generating process of labels where the dependences between the labels are taken into account. We also propose an efficient algorithm for calculating the probability of generating an arbitrary subset of labels. The multilabel conditioned document model can be regarded as a supervised label mixture model, in which labels for a document are known. Each label is characterized by distributions over words. For the parameter learning of the multilabel conditioned document model, in addition to maximum-likelihood estimation, a discriminative approach based on the minimum classification error rate training is proposed. To evaluate LCMM, extensive multilabel categorization experiments are conducted on a spoken document data set and three standard text data sets. The experimental results in comparison with other competitive methods demonstrate the effectiveness of LCMM.
  • Keywords
    maximum likelihood estimation; mixture models; pattern classification; text analysis; LCMM; label correlation mixture model; maximum-likelihood estimation; minimum classification error rate training; multilabel conditioned document model; multilabel spoken document categorization; multilabel text categorization; parameter learning; probabilistic generative model; supervised generative approach; supervised label mixture model; Computational modeling; Correlation; Mathematical model; Maximum likelihood estimation; Probabilistic logic; Probability; Text categorization; Bayesian decision theory; Label correlation mixture model; label correlation mixture model; minimum classification error rate method; multi-label spoken document categorization; multi-label text categorization; probabilistic generative model;
  • fLanguage
    English
  • Journal_Title
    Emerging Topics in Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-6750
  • Type

    jour

  • DOI
    10.1109/TETC.2014.2377559
  • Filename
    6975104