Identification of Objectionable Audio Segments Based on Pseudo and Heterogeneous Mixture Models

Author

Ziqiang Shi ; Jiqing Han ; Tieran Zheng ; Ji Li

Author_Institution

Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China

Volume

21

Issue

3

fYear

2013

fDate

Mar-13

Firstpage

611

Lastpage

623

Abstract

In this paper, we generalize the Gaussian Mixture Model (GMM) in two ways: a) by introducing novel distance measures between two vectors based on nonlinear maps to give more general mixture models; b) by building mixture models based on multiple different kinds of distributions. These two generalizations cope with different problems arisen in feature modeling. Mixture model obtained by first method is called pseudo Gaussian Mixture Model (pseudo GMM). Compared to the traditional GMM, pseudo GMM with nonlinear maps have better performance on nonlinear problems, while the computational complexity is almost the same as the Expectation-Maximization (EM) algorithm for traditional GMM according to the iteration procedures. The second generalization considers that in practice the practical learning problem often involves multiple, heterogeneous data sources, while classical mixture models are based on a single kind of distribution. In this work, we consider heterogeneous mixture models (hetMM) based on multiple different kinds of distributions. Different types of distributions in hetMM may have quite different properties and may capture different features of the data. Component classifiers including pseudo and hetMM based classifiers are employed in our task of erotic audio recognition. Experimental results with classifiers built based on pseudo GMM and hetMM for erotic audio recognition demonstrate the effectiveness of the proposed model. Online and off-line experiments show that the proposed approach is highly effective for erotic audio recognition.

Keywords

Gaussian distribution; audio signal processing; computational complexity; expectation-maximisation algorithm; iterative methods; learning (artificial intelligence); Gaussian distribution; computational complexity; erotic audio recognition; expectation maximization algorithm; feature modeling; hetMM; heterogeneous data source; heterogeneous mixture model; iteration procedure; learning problem; nonlinear maps; nonlinear problem; objectionable audio segment identification; pseudo GMM; pseudo Gaussian mixture model; Computational modeling; Data models; Kernel; Speech; Speech processing; Ensemble classifiers; Gaussian mixture model; SVM; erotic audio; expectation-maximization (EM) algorithm; logistic distribution; pseudo GMM; student\´s $t$-distribution; voiced fragment;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2012.2229980

Filename

6362181