• DocumentCode
    2021065
  • Title

    Monaural voiced speech segregation based on combined cues and energy distribution

  • Author

    Zhao, Liheng ; Wang, Zengfu

  • Author_Institution
    Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2010
  • fDate
    23-25 Nov. 2010
  • Firstpage
    57
  • Lastpage
    63
  • Abstract
    Monaural speech segregation is important for speech signal processing, and it has been extensively studied on the basis of auditory scene analysis principles. However, current segregation algorithms can not achieve satisfactory performance in high frequency range. In this paper, we propose a system for monaural voiced speech segregation, in which two novel ideas are investigated. First, combined cues (including cross-channel correlation, temporal continuity, and onset/offset) are employed to generate segments in high frequency range. Second, the energy distribution of mixed signal is employed to indicate the reliabilities of cues in high frequency range, according to which, an alternative segmentation strategy is performed. Systematic evaluation and comparison show that the proposed system produces improvement on SNR gain.
  • Keywords
    speech processing; SNR gain; auditory scene analysis; cues distribution; energy distribution; monaural voiced speech segregation algorithm; speech signal processing; systematic evaluation; Correlation; Erbium; Signal to noise ratio; Speech; Speech processing; Wideband;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Audio Language and Image Processing (ICALIP), 2010 International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-5856-1
  • Type

    conf

  • DOI
    10.1109/ICALIP.2010.5685014
  • Filename
    5685014