• DocumentCode
    2178448
  • Title

    On the use of ideal binary masks for improving phonetic classification

  • Author

    Narayanan, Arun ; Wang, DeLiang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5212
  • Lastpage
    5215
  • Abstract
    Ideal binary masks are binary patterns that encode the masking characteristics of speech in noise. Recent evidence in speech perception suggests that such binary patterns provide sufficient information for human speech recognition. Motivated by these findings, we propose to use ideal binary masks to improve phonetic modeling. We show that by combining the outputs of classifiers trained on the traditional MFCC features and this novel speech pattern, statistically significant improvements over the baseline MFCC based classifier can be achieved for the task of phonetic classification. Using the combined classifiers, we achieve an error rate of 19.5% on the TIMIT phonetic classification task using multilayer perceptrons as the underlying classifier.
  • Keywords
    speech recognition; MFCC; TIMIT phonetic classification; binary masks; binary patterns; human speech recognition; phonetic classification; Error analysis; Mel frequency cepstral coefficient; Signal to noise ratio; Speech; Speech recognition; Training; CASA; Speech recognition; TIMIT; ideal binary mask; phone classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947532
  • Filename
    5947532