• DocumentCode
    3527583
  • Title

    Acoustic model combination to compensate for residual noise in multi-channel source separation

  • Author

    Yoon, Jae Sam ; Park, Ji Hun ; Kim, Hong Kook

  • Author_Institution
    Dept. of Inf. & Commun., Gwangju Inst. of Sci. & Technol. (GIST), Gwangju
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    3925
  • Lastpage
    3928
  • Abstract
    In this paper, we propose an acoustic model combination technique for reducing a mismatch in a multi-channel noisy environment. To this end, we first apply a mask-based multi-channel source separation method, typically computational auditory scene analysis (CASA), to separate the speech source from noise. However, a certain degree of noise remains in the separated speech source, especially under low signal-to-noise ratio (SNR) conditions since the estimated mask is not ideal. Thus, the performance of automatic speech recognition (ASR) is limited. To improve ASR performance, the remaining noise can be further compensated in the acoustic model domain under a framework of parallel model combination. In particular, a noise model for PMC is estimated from the noise remained after application of the mask-based source separation, and SNR for PMC is also estimated based on the average of relative magnitude of mask along the utterance. It is shown from the experiments that the proposed acoustic model combination method relatively reduces the word error rate by 52.14% compared to mask-based source separation alone.
  • Keywords
    acoustic signal processing; source separation; speech recognition; ASR performance; PMC approach; acoustic model combination method; automatic speech recognition; computational auditory scene analysis; mask-based multichannel source separation; parallel model combination; residual noise; Acoustic noise; Automatic speech recognition; Image analysis; Noise reduction; Signal to noise ratio; Source separation; Speech analysis; Speech coding; Speech enhancement; Working environment noise; Speech recognition; computational auditory scene analysis; mask-based SNR estimation; mask-based noise model estimation; multi-channel source separation; parallel model combination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960486
  • Filename
    4960486