• DocumentCode
    738135
  • Title

    Acoustic Analysis for Automatic Speech Recognition

  • Author

    O´Shaughnessy, D.

  • Author_Institution
    Inst. Nat. de la Rech. Sci. (INRS), Univ. of Quebec, Montreal, QC, Canada
  • Volume
    101
  • Issue
    5
  • fYear
    2013
  • fDate
    5/1/2013 12:00:00 AM
  • Firstpage
    1038
  • Lastpage
    1053
  • Abstract
    As a pattern recognition application, automatic speech recognition (ASR) requires the extraction of useful features from its input signal, speech. To help determine relevance, human speech production and acoustic aspects of speech perception are reviewed, to identify acoustic elements likely to be most important for ASR. Common methods of estimating useful aspects of speech spectral envelopes are reviewed, from the point of view of efficiency and reliability in mismatched conditions. Because many speech inputs for ASR have noise and channel degradations, ways to improve robustness in speech parameterization are analyzed. While the main focus in ASR is to obtain spectral envelope measures, human speech communication efficiently exploits the manipulation of one´s vocal-cord vibration rate [fundamental frequency (F0)], and so F0 extraction and its integration into ASR are also reviewed. For the acoustic analysis reviewed here for ASR, this work presents modern methods as well as future perspectives on important aspects of speech information processing.
  • Keywords
    reliability; speech recognition; ASR; acoustic analysis; automatic speech recognition; channel degradations; human speech communication; human speech production; pattern recognition application; reliability; speech information processing; speech perception; speech spectral envelopes; vocal-cord vibration rate; Automatic speech recognition; Digital signal processing; Information processing; Pattern recognition; Spectral analysis; Speech processing; Speech recognition; Time-frequency analysis; Automatic speech recognition; digital signal processing; pattern recognition; spectral analysis; speech analysis; time-frequency representation;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/JPROC.2013.2251592
  • Filename
    6494580