DocumentCode
1290810
Title
A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models
Author
Wohlmayr, Michael ; Stark, Michael ; Pernkopf, Franz
Author_Institution
Signal Process. & Speech Commun. Lab. (SPSC), Graz Univ. of Technol., Graz, Austria
Volume
19
Issue
4
fYear
2011
fDate
5/1/2011 12:00:00 AM
Firstpage
799
Lastpage
810
Abstract
We present a simple and efficient feature modeling approach for tracking the pitch of two simultaneously active speakers. We model the spectrogram features of single speakers using Gaussian mixture models in combination with the minimum description length model selection criterion. To obtain a probabilistic representation for the speech mixture spectrogram features of both speakers, we employ the mixture maximization model (MIXMAX) and, as an alternative, a linear interaction model. A factorial hidden Markov model is applied for tracking pitch over time. This statistical model can be used for applications beyond speech, whenever the interaction between individual sources can be represented as MIXMAX or linear model. For tracking, we use the loopy max-sum algorithm, and provide empirical comparisons to exact methods. Furthermore, we discuss a scheduling mechanism of loopy belief propagation for online tracking. We demonstrate experimental results using Mocha-TIMIT as well as data from the speech separation challenge provided by Cooke We show the excellent performance of the proposed method in comparison to a well known multipitch tracking algorithm based on correlogram features. Using speaker-dependent models, the proposed method improves the accuracy of correct speaker assignment, which is important for single-channel speech separation. In particular, we are able to reduce the overall tracking error by 51% relative for the speaker-dependent case. Moreover, we use the estimated pitch trajectories to perform single-channel source separation, and demonstrate the beneficial effect of correct speaker assignment on speech separation performance.
Keywords
hidden Markov models; optimisation; speech recognition; Gaussian mixture model; factorial hidden Markov model; mixture maximization model; multipitch tracking; probabilistic interaction model; single channel source separation; Factorial hidden Markov model (FHMM); Gaussian mixture model (GMM); mixture maximization; multipitch tracking; speech analysis;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2010.2064309
Filename
5545375
Link To Document