• DocumentCode
    2150843
  • Title

    A hierarchical generative model for Generic Audio Document Categorization

  • Author

    Zeng, Zhi ; Zhang, Shuwu

  • Author_Institution
    Inst. of Autom., Chinese Acad. of Sci., Beijing, China
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    405
  • Lastpage
    408
  • Abstract
    In this paper, we call the pattern classification problem that consists in assigning a category label to a long audio signal based on its semantic content as Generic Audio Document Categorization (GADC). A novel generative model is proposed to describe the generic audio document categories and solve the GADC problem. This model is a four-level hierarchical model in which two latent variables "audio topic" and "audio word" are introduced in addition to the two observed variables category and audio feature. We present an iterative learning algorithm including two Expectation-Maximization (EM) cycles to estimate the model parameters and give a discriminative document weighting procedure to make the model more discriminative. Subsequently, the distribution of "audio topic" in the well-trained model is utilized to represent each generic audio document category. This is same with some bag-of-word methods. However, our method is advanced since it does not require quantizing the continuous audio features to a vocabulary of "audio words". Finally, experiment results show the effectiveness of our approach.
  • Keywords
    audio signal processing; expectation-maximisation algorithm; learning (artificial intelligence); parameter estimation; signal classification; EM cycle; GADC problem; audio feature; audio signal; audio topic; audio word; bag-of-word method; category label; discriminative document weighting procedure; expectation-maximization cycle; four-level hierarchical model; generic audio document categorization; generic audio document category; hierarchical generative model; iterative learning algorithm; latent variable; model parameter estimation; pattern classification; semantic content; vocabulary; Accuracy; Nickel; Semantics; Speech; Training; Video sequences; Vocabulary; Audio content analysis; generative model; generic audio document categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5946426
  • Filename
    5946426