Monaural voiced speech segregation based on combined cues and energy distribution

Author

Zhao, Liheng ; Wang, Zengfu

Author_Institution

Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China

fYear

2010

fDate

23-25 Nov. 2010

Firstpage

57

Lastpage

63

Abstract

Monaural speech segregation is important for speech signal processing, and it has been extensively studied on the basis of auditory scene analysis principles. However, current segregation algorithms can not achieve satisfactory performance in high frequency range. In this paper, we propose a system for monaural voiced speech segregation, in which two novel ideas are investigated. First, combined cues (including cross-channel correlation, temporal continuity, and onset/offset) are employed to generate segments in high frequency range. Second, the energy distribution of mixed signal is employed to indicate the reliabilities of cues in high frequency range, according to which, an alternative segmentation strategy is performed. Systematic evaluation and comparison show that the proposed system produces improvement on SNR gain.

Keywords

speech processing; SNR gain; auditory scene analysis; cues distribution; energy distribution; monaural voiced speech segregation algorithm; speech signal processing; systematic evaluation; Correlation; Erbium; Signal to noise ratio; Speech; Speech processing; Wideband;

fLanguage

English

Publisher

ieee

Conference_Titel

Audio Language and Image Processing (ICALIP), 2010 International Conference on

Conference_Location

Shanghai

Print_ISBN

978-1-4244-5856-1

Type

conf

DOI

10.1109/ICALIP.2010.5685014

Filename

5685014