DocumentCode :
3379406
Title :
Usable speech detection using a context dependent Gaussian mixture model classifier
Author :
Yantorno, Robert E. ; Smolenski, Brett Y ; Iyer, Ananth N. ; Shah, Jashmin K.
Author_Institution :
Dept. of ECE, Temple Univ., Philadelphia, PA, USA
Volume :
5
fYear :
2004
fDate :
23-26 May 2004
Abstract :
Speech that is corrupted by nonstationary interference, but contains segments that are still usable for applications such as speaker identification or speech recognition, is referred to as "usable" speech. A common example of nonstationary interference occurs when there is more than one person talking at the same time, which is known as co-channel speech. In general the above speech processing applications do not work in co-channel environments; however, they can work on the extracted usable segments. Unfortunately, currently available usable speech measures only detect about 75% of the total available usable speech. The first reason for this high error stems from the fact that no single feature can accurately identify all the usable speech characteristics. This situation can be resolved by using a Gaussian mixture model (GMM) based classifier to combine several usable speech features. A second source of error stems from the fact that the current usable speech measures treat each frame of co-channel data independently of the decisions made on adjacent frames. The situation can be resolved when a hidden Markov model (HMM) is used to incorporate any context dependent information in adjacent frames. Using this approach we were able to obtain 84% reduction of usable speech with a 16% false alarm rate.
Keywords :
Gaussian processes; hidden Markov models; speech processing; speech recognition; Gaussian mixture model classifier; co-channel speech; context dependence; context dependent information; corrupted speech; error source; extracted usable segments; false alarm rate; hidden Markov model; independent frame treatment; nonstationary interference; speaker identification; speech processing applications; speech recognition; usable speech detection; usable speech features; usable speech identification; usable speech reduction; usable speech segments; Audio recording; Context modeling; Current measurement; Hidden Markov models; Interference; Predictive models; Signal restoration; Speech enhancement; Speech processing; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on
Print_ISBN :
0-7803-8251-X
Type :
conf
DOI :
10.1109/ISCAS.2004.1329884
Filename :
1329884
Link To Document :
بازگشت