DocumentCode
394701
Title
Speech segregation using event synchronous auditory vocoder
Author
Irino, T. ; Patterson, R.D. ; Kawahara, H.
Author_Institution
CREST-JST, Wakayama Univ., Japan
Volume
5
fYear
2003
fDate
6-10 April 2003
Abstract
We present a new auditory method to segregate concurrent speech sounds. The system is based on an auditory vocoder developed to resynthesize speech from an auditory Mellin representation using the vocoder STRAIGHT (Kawahara, H. et al., Speech Communication, vol.27, p.187-207, 1999). The quality of the transmitted sound is improved by introducing an event synchronous procedure to estimate glottal pulse times. The auditory representation preserves fine temporal information, unlike conventional window-based processing, which makes it possible to segregate the speech synchronously. The results show that the segregation is good even when the SNR is 0 dB; the extracted target speech was a little distorted but entirely intelligible (like telephone speech), whereas the distracter speech was reduced to a non-speech sound that was not perceptually disturbing. This auditory vocoder has potential for speech enhancement in applications such as hearing aids.
Keywords
hearing; parameter estimation; source separation; speech enhancement; speech recognition; speech synthesis; vocoders; auditory representation; concurrent speech sounds; event synchronous auditory vocoder; glottal pulse time estimation; human auditory processing; multispeaker recognition; speech enhancement; speech resynthesis; speech segregation; speech synthesis; Auditory system; Data mining; Event detection; Frequency; Humans; Speech enhancement; Speech processing; Speech synthesis; Telephony; Vocoders;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1200022
Filename
1200022
Link To Document