• DocumentCode
    1099751
  • Title

    Monaural speech segregation based on pitch tracking and amplitude modulation

  • Author

    Hu, Guoning ; Wang, DeLiang

  • Author_Institution
    Biophys. Program, Ohio State Univ., Columbus, OH, USA
  • Volume
    15
  • Issue
    5
  • fYear
    2004
  • Firstpage
    1135
  • Lastpage
    1150
  • Abstract
    Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. We propose a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently. For resolved harmonics, the system generates segments based on temporal continuity and cross-channel correlation, and groups them according to their periodicities. For unresolved harmonics, it generates segments based on common amplitude modulation (AM) in addition to temporal continuity and groups them according to AM rates. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to dominant pitch and then adjusted according to psychoacoustic constraints. Our system is systematically evaluated and compared with pervious systems, and it yields substantially better performance, especially for the high-frequency part of speech.
  • Keywords
    acoustic signal processing; amplitude modulation; correlation methods; harmonics; speech enhancement; amplitude modulation; auditory scene analysis; cross-channel correlation; harmonics; monaural speech segregation; pitch contour; pitch tracking; temporal continuity; voice speech segregation; Amplitude modulation; Automatic speech recognition; Hidden Markov models; Image analysis; Interference; Power harmonic filters; Psychology; Sensor arrays; Speech analysis; Speech enhancement; AM; Amplitude modulation; computational auditory scene analysis; grouping; monaural speech segregation; pitch tracking; segmentation;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2004.832812
  • Filename
    1333078