Robust Speaker Recognition in Noisy Conditions

Author

Ming, Ji ; Hazen, Timothy J. ; Glass, James R. ; Reynolds, Douglas A.

Author_Institution

Queen´´s Univ. Belfast, Belfast

Volume

15

Issue

5

fYear

2007

fDate

7/1/2007 12:00:00 AM

Firstpage

1711

Lastpage

1723

Abstract

This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model´s complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy condi- tions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.

Keywords

acoustic noise; speaker recognition; missing-feature theory; multicondition model training; noisy conditions; robust speaker recognition; speaker identification; speaker verification; speech signals; temporal-spectral characteristics; Databases; Handheld computers; Noise reduction; Noise robustness; Signal processing; Speaker recognition; Speech enhancement; Testing; Training data; Working environment noise; Missing-feature theory; multicondition training; noise compensation; noise modeling; speaker recognition;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2007.899278

Filename

4244529