• DocumentCode
    394351
  • Title

    Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units

  • Author

    Sethy, Abhinav ; Narayanan, Shrikanth

  • Author_Institution
    Dept. of Electr. Eng. Syst., Univ. of Southern California, Los Angeles, CA, USA
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    Most speech recognition systems, especially LVCSR, use context dependent phones as the basic acoustic unit for recognition. The primary motive for this is the relative ease with which phone based systems can be trained robustly with small amounts of data. However as recent research indicates, significant improvements in recognition accuracy can be gained by using acoustic units of longer duration such as syllables. Syllable and other longer length units provide an efficient way for modeling long term temporal dependencies in speech which are difficult to cover in a phoneme based recognition framework. But these longer duration units suffer from the training data sparsity problem since a large number of units in the lexicon will have little or no acoustic training data. In this paper we present a two step approach to address the training data sparsity problem. First we use CD phones to initialize the higher level units in a manner which minimizes the impact of training data sparsity. Subsequently we present methods to split the lexicon into units of different acoustic length based on a analysis of the training data. We present results which show that a 25-30% improvement in terms of word error rate can be achieved by using CD phone initialization and variable length unit selection on a medium vocabulary continuous speech recognition task.
  • Keywords
    error statistics; speech processing; speech recognition; vocabulary; CD phone initialization; CD phones; LVCSR; continuous speech recognition task; long term temporal dependencies; medium vocabulary; modeling; split-lexicon based hierarchical recognition; syllables; training data sparsity problem; two step approach; variable length unit selection; word error rate; word level acoustic units; Acoustical engineering; Automatic speech recognition; Context modeling; Error analysis; Feature extraction; Robustness; Speech analysis; Speech recognition; Training data; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198895
  • Filename
    1198895