• DocumentCode
    2059973
  • Title

    Acquiring variable length speech bases for factorisation-based noise robust speech recognition

  • Author

    Hurmalainen, Antti ; Virtanen, Tuomas

  • Author_Institution
    Tampere Univ. of Technol., Tampere, Finland
  • fYear
    2013
  • fDate
    9-13 Sept. 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Studies from multiple disciplines show that spectro-temporal units of natural languages and human speech perception are longer than short-time frames commonly employed in automatic speech recognition. Extended temporal context is also beneficial for separation of concurrent sound sources such as speech and noise. However, the length of patterns in speech varies greatly, making it difficult to model with fixed-length units. We propose methods for acquiring variable length speech atom bases for accurate yet compact representation of speech with a large temporal context. Bases are generated from spectral features, from assigned state labels, and as a combination of both. Results for factorisation-based speech recognition in noisy conditions show equal or better separation and recognition quality in comparison to fixed length units, while model sizes are reduced by up to 40%.
  • Keywords
    acoustic generators; acoustic radiators; feature extraction; matrix decomposition; natural languages; pattern clustering; speech recognition; speech synthesis; automatic speech recognition; extended temporal context; factorisation-based noise; human speech perception; natural languages; sound sources; spectral features; spectro-temporal units; variable length speech; Context; Correlation; Mathematical model; Noise; Spectrogram; Speech; Speech recognition; Spectral factorization; noise robustness; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference (EUSIPCO), 2013 Proceedings of the 21st European
  • Conference_Location
    Marrakech
  • Type

    conf

  • Filename
    6811688