• DocumentCode
    33688
  • Title

    Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing

  • Author

    Nickel, Robert M. ; Astudillo, Ramon Fernandez ; Kolossa, Dorothea ; Martin, Rashad

  • Author_Institution
    Dept. of Electr. Eng., Bucknell Univ., Lewisburg, PA, USA
  • Volume
    21
  • Issue
    5
  • fYear
    2013
  • fDate
    May-13
  • Firstpage
    983
  • Lastpage
    997
  • Abstract
    We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao´s method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.
  • Keywords
    Gaussian processes; signal denoising; smoothing methods; speech enhancement; speech recognition; speech synthesis; GMM; Gaussian mixture model; PESQ scores; Xiao method; background noise; bandwidth 4 kHz to 8 kHz; cepstral smoothing; cepstral smoothing operation; corpus-based speech enhancement system; noise dependent system training elimination; noisy signal; phoneme recognition front-end; prerecorded clean signal inventory; signal-to-noise ratios; sinusoidal modeling; speech content resynthesis; speech signals; state decoding; subjective CMOS tests; uncertainty modeling; uncertainty modeling technique; vector quantizer; Cepstral analysis; Nickel; Noise; Speech; Speech enhancement; Speech recognition; Uncertainty; Inventory-style speech enhancement; modified imputation; uncertainty-of-observation techniques;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2243434
  • Filename
    6423260