• DocumentCode
    2113535
  • Title

    Robustifying cepstral features by mitigating the outlier effect for noisy speech recognition

  • Author

    Hao-teng Fan ; Kuan-wei Hsieh ; Chien-Hao Huang ; Jeih-weih Hung

  • Author_Institution
    Dept. of Electr. Eng., Nat. Chi Nan Univ., Puli, Taiwan
  • fYear
    2013
  • fDate
    23-25 July 2013
  • Firstpage
    935
  • Lastpage
    939
  • Abstract
    The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.
  • Keywords
    audio databases; cepstral analysis; feature extraction; median filters; speech recognition; ASR systems; Aurora-2 digit recognition database; CMVN; MFCC speech feature processing; automatic speech recognition system; cepstral features robustification; cepstral mean-and-variance normalization; impulse-like outliers; median filter; noise effect; noise interference; noisy speech recognition; outlier effect mitigation; sigmoid function transformation; word error rate reduction; Frequency modulation; Fuzzy systems; Knowledge discovery; Mel frequency cepstral coefficient; Robustness; Signal to noise ratio; cepstral mean and variance normalization; median filter; noise robustness; sigmoid function; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference on
  • Conference_Location
    Shenyang
  • Type

    conf

  • DOI
    10.1109/FSKD.2013.6816329
  • Filename
    6816329