• DocumentCode
    960364
  • Title

    Robust Speaker Recognition in Noisy Conditions

  • Author

    Ming, Ji ; Hazen, Timothy J. ; Glass, James R. ; Reynolds, Douglas A.

  • Author_Institution
    Queen´´s Univ. Belfast, Belfast
  • Volume
    15
  • Issue
    5
  • fYear
    2007
  • fDate
    7/1/2007 12:00:00 AM
  • Firstpage
    1711
  • Lastpage
    1723
  • Abstract
    This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model´s complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy condi- tions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.
  • Keywords
    acoustic noise; speaker recognition; missing-feature theory; multicondition model training; noisy conditions; robust speaker recognition; speaker identification; speaker verification; speech signals; temporal-spectral characteristics; Databases; Handheld computers; Noise reduction; Noise robustness; Signal processing; Speaker recognition; Speech enhancement; Testing; Training data; Working environment noise; Missing-feature theory; multicondition training; noise compensation; noise modeling; speaker recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2007.899278
  • Filename
    4244529