Speech segregation using event synchronous auditory vocoder

Author

Irino, T. ; Patterson, R.D. ; Kawahara, H.

Author_Institution

CREST-JST, Wakayama Univ., Japan

Volume

5

fYear

2003

fDate

6-10 April 2003

Abstract

We present a new auditory method to segregate concurrent speech sounds. The system is based on an auditory vocoder developed to resynthesize speech from an auditory Mellin representation using the vocoder STRAIGHT (Kawahara, H. et al., Speech Communication, vol.27, p.187-207, 1999). The quality of the transmitted sound is improved by introducing an event synchronous procedure to estimate glottal pulse times. The auditory representation preserves fine temporal information, unlike conventional window-based processing, which makes it possible to segregate the speech synchronously. The results show that the segregation is good even when the SNR is 0 dB; the extracted target speech was a little distorted but entirely intelligible (like telephone speech), whereas the distracter speech was reduced to a non-speech sound that was not perceptually disturbing. This auditory vocoder has potential for speech enhancement in applications such as hearing aids.

Keywords

hearing; parameter estimation; source separation; speech enhancement; speech recognition; speech synthesis; vocoders; auditory representation; concurrent speech sounds; event synchronous auditory vocoder; glottal pulse time estimation; human auditory processing; multispeaker recognition; speech enhancement; speech resynthesis; speech segregation; speech synthesis; Auditory system; Data mining; Event detection; Frequency; Humans; Speech enhancement; Speech processing; Speech synthesis; Telephony; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1200022

Filename

1200022