• DocumentCode
    1548036
  • Title

    Cochannel speaker separation by harmonic enhancement and suppression

  • Author

    Morgan, David P. ; George, E. Bryan ; Lee, Leonard T. ; Kay, Steven M.

  • Author_Institution
    Signal Process. Center of Technol., Lockheed-Martin Inc., Nashua, NH, USA
  • Volume
    5
  • Issue
    5
  • fYear
    1997
  • fDate
    9/1/1997 12:00:00 AM
  • Firstpage
    407
  • Lastpage
    424
  • Abstract
    This paper presents a system for separating the cochannel speech of two talkers. The proposed harmonic enhancement and suppression (HES) system is based on a frame-by-frame speaker separation algorithm that exploits the pitch estimate of the stronger talker derived from the cochannel signal. The idea behind this approach is to recover the stronger talker´s speech by enhancing their harmonic frequencies and formants given a multiresolution pitch estimate. The weaker talker´s speech is obtained from the residual signal created when the harmonics and formants of the stronger talker are suppressed. An automatic speaker assignment algorithm is used to place recovered frames from the target and interfering talkers in separate channels. Automatic speaker assignment performs reasonably well in most cochannel environments, including voiced-on-voiced, voiced-on-unvoiced, unvoiced-on-unvoiced, assignment after processing silence intervals, and single talker speech (no cochannel interference). The HES system has been tested at target-to-interferer ratios (TIRs) from -18 to 18 dB with widely available data bases. It has demonstrated improved performance in keyword spotting tests for TIR values of 6, 12, and 18 dB, and in human listening tests for TIR values of -6 and -18 dB
  • Keywords
    harmonics; maximum likelihood detection; parameter estimation; signal resolution; speech enhancement; speech processing; automatic speaker assignment; automatic speaker assignment algorithm; cochannel environments; cochannel signal; cochannel speaker separation; cochannel speech separation; databases; formants; harmonic enhancement; harmonic frequencies; harmonic suppression; human listening tests; keyword spotting tests; multiresolution pitch estimate; pitch detection; residual signal; silence intervals processing; single talker speech; speaker separation algorithm; target to interferer ratios; unvoiced-on-unvoiced environment; voiced on voiced environment; voiced-on-unvoiced environment; Digital signal processing; Frequency estimation; Maximum likelihood detection; Signal processing; Signal processing algorithms; Signal resolution; Speech analysis; Speech enhancement; Speech processing; Testing;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.622561
  • Filename
    622561