An extended model for speech segregation

Author

Hu, Guoning ; Wang, DeLiang

Author_Institution

Biophys. Program, Ohio State Univ., Columbus, OH, USA

Volume

2

fYear

2001

fDate

2001

Firstpage

1089

Abstract

Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown (1999) proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. We extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include estimation of the pitch of target speech and refined generation of a target speech stream with the estimated pitch. Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance

Keywords

neural nets; physiological models; speech processing; speech synthesis; auditory scene analysis; extended model; multistage neural model; pitch estimation; psychoacoustic evidence; speech segregation; two-layer oscillator network; Automatic speech recognition; Image analysis; Oscillators; Psychoacoustic models; Psychology; Speech analysis; Speech enhancement; Speech processing; Speech synthesis; Wideband;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on

Conference_Location

Washington, DC

ISSN

1098-7576

Print_ISBN

0-7803-7044-9

Type

conf

DOI

10.1109/IJCNN.2001.939512

Filename

939512