DocumentCode :
1861539
Title :
Selection, parameter estimation, and discriminative training of hidden Markov models for general audio modeling
Author :
Reyes-Gomez, Manuel J. ; Ellis, Daniel P W
Author_Institution :
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
Volume :
1
fYear :
2003
fDate :
6-9 July 2003
Abstract :
Hidden Markov models (HMMs) permit a natural and flexible way to model time-sequential data. The ease of concatenation and time-warping algorithms implementation on HMMs suit them very well for segmentation and content based audio classification applications, as is clear from their extended and successful use on speech recognition applications. Speech has a natural basic unit, the phone, which normally delimits the number of models to one per phone. Moreover, knowledge of the speech structure facilitates the choice of the model parameters. When modeling generic audio, on other hand, the lack of a natural basic unit, and the absence of a clear structure, make the selection and the parameter estimation of an optimal set of HMMs difficult. In this paper we present different approaches to select and estimate the HMM parameters of a set of representative generic audio classes. We compare these approaches in the context of a content- based classification application using the MuscleFish database. The models are first found through frame clustering or by traditional EM techniques under some specific selection criteria, such as the Bayesian information criterion. Further discriminative training of the initial models considerably improve their performance in the content-based classification task, obtaining results comparable with the ones obtained, for the same task, by inherently discriminative classification methods, such as support vector machines, while preserving the intrinsic flexibility of HMMs.
Keywords :
audio signal processing; hidden Markov models; parameter estimation; statistical analysis; Bayesian information criterion; MuscleFish database; content based audio classification applications; discriminative training; frame clustering; general audio modeling; generic audio classes; hidden Markov models; parameter estimation; speech recognition applications; time-sequential data; Bayesian methods; Databases; Decoding; Dictionaries; Hidden Markov models; Parameter estimation; Speech recognition; Streaming media; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7965-9
Type :
conf
DOI :
10.1109/ICME.2003.1220857
Filename :
1220857
Link To Document :
بازگشت