DocumentCode :
1403217
Title :
Speaker Identification Within Whispered Speech Audio Streams
Author :
Fan, Xing ; Hansen, John H L
Author_Institution :
Dept. of Electr. Eng., Univ. of Texas at Dallas, Richardson, TX, USA
Volume :
19
Issue :
5
fYear :
2011
fDate :
7/1/2011 12:00:00 AM
Firstpage :
1408
Lastpage :
1421
Abstract :
Whisper is an alternative speech production mode used by subjects in natural conversation to protect the privacy. Due to the profound differences between whisper and neutral speech in both excitation and vocal tract function, the performance of speaker identification systems trained with neutral speech degrades significantly. In this paper, a seamless neutral/whisper mismatched closed-set speaker recognition system is developed. First, performance characteristics of a neutral trained closed-set speaker ID system based on an Mel-frequency cepstral coefficient-Gaussian mixture model (MFCC-GMM) framework is considered. It is observed that for whisper speaker recognition, performance degradation is concentrated for only a subset of speakers. Next, it is shown that the performance loss for speaker identification in neutral/whisper mismatched conditions is focused on phonemes other than low-energy unvoiced consonants. In order to increase system performance for unvoiced consonants, an alternative feature extraction algorithm based on linear and exponential frequency scales is applied. The acoustic properties of misrecognized and correctly recognized whisper are analyzed in order to develop more effective processing schemes. A two-dimensional feature space is proposed in order to predict on which whispered utterances the system will perform poorly, with evaluations conducted to measure the quality of whispered speech. Finally, a system for seamless neutral/whisper speaker identification is proposed, resulting in an absolute improvement of 8.85%-10.30% for speaker recognition, with the best closed set speaker ID performance of 88.35% obtained for a total of 961 read whisper test utterances, and 83.84% using a total of 495 spontaneous whisper test utterances.
Keywords :
Gaussian processes; feature extraction; speech recognition; MFCC-GMM; Mel-frequency cepstral coefficient-Gaussian mixture model; closed-set speaker ID system; feature extraction algorithm; neutral speech; neutral-whisper mismatched closed-set speaker recognition system; speaker identification; speech production mode; two-dimensional feature space; vocal tract function; whispered speech audio streams; Acoustics; Degradation; Production; Speaker recognition; Speech; Speech processing; System performance; Mel-frequency cepstral coefficient (MFCC); robust speaker verification; speaker identification; vocal effort; whispered speech;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2010.2091631
Filename :
5667042
Link To Document :
بازگشت