• DocumentCode
    3527138
  • Title

    Multiple time resolution analysis of speech signal using MCE training with application to speech recognition

  • Author

    Dimopoulos, Spiros ; Potamianos, Alexandros ; Lussier, Eric-Fosler ; Lee, Chin-Hui

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Tech. Univ. of Crete, Chania
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    3801
  • Lastpage
    3804
  • Abstract
    In this paper, we propose two methods of multiple time-resolution analysis of speech and their application to automatic speech recognition (ASR). Constant frame-rate multi-scale analysis is proposed based on a box of multi-scale features. Then a variable rate analysis is proposed based on the selection of the optimal temporal resolution on the fly by a properly trained non-linear classifier unit. The classifier´s parameters are trained using the discriminative method of minimum classification error (MCE) training. We use the recently proposed conditional random fields (CRF) phonetic recognition system that effectively combines highly correlated features. Results are reported on a frame-wise classification task and also on TIMIT phone recognition task. Results show that (i) CRFs can effectively combine multi-scale features and (ii) MCE trained variable rate CRFs are competitive with the ldquoboxrdquo combination method.
  • Keywords
    speech processing; speech recognition; TIMIT phone recognition task; automatic speech recognition; conditional random fields; frame-wise classification task; minimum classification error; multiple time resolution analysis; multiscale features; phonetic recognition; speech signal; Application software; Automatic speech recognition; Computer science; Hidden Markov models; Signal analysis; Signal resolution; Spectral analysis; Speech analysis; Speech processing; Speech recognition; ASR; Conditional Random Fields; MCE; Multiple Frame Rates; Variable Frame Rate;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960455
  • Filename
    4960455