• DocumentCode
    783873
  • Title

    Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language

  • Author

    Le, Viet-Bac ; Besacier, Laurent

  • Author_Institution
    LIG Lab., Joseph Fourier Univ., Grenoble, France
  • Volume
    17
  • Issue
    8
  • fYear
    2009
  • Firstpage
    1471
  • Lastpage
    1482
  • Abstract
    This paper presents our work in automatic speech recognition (ASR) in the context of under-resourced languages with application to Vietnamese. Different techniques for bootstrapping acoustic models are presented. First, we present the use of acoustic-phonetic unit distances and the potential of crosslingual acoustic modeling for under-resourced languages. Experimental results on Vietnamese showed that with only a few hours of target language speech data, crosslingual context independent modeling worked better than crosslingual context dependent modeling. However, it was outperformed by the latter one, when more speech data were available. We concluded, therefore, that in both cases, crosslingual systems are better than monolingual baseline systems. The proposal of grapheme-based acoustic modeling, which avoids building a phonetic dictionary, is also investigated in our work. Finally, since the use of sub-word units (morphemes, syllables, characters, etc.) can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling for under-resourced languages, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. The proposed lattice combination scheme results in a relative syllable error rate reduction of 6.6% over the sentence MAP baseline method for a Vietnamese ASR task.
  • Keywords
    acoustic signal processing; natural languages; speech recognition; statistical analysis; ASR; Vietnamese language; automatic speech recognition; crosslingual acoustic modeling; crosslingual context dependent modeling; crosslingual context independent modeling; grapheme-based acoustic modeling; lattice combination scheme; monolingual baseline system; phonetic dictionary; sentence MAP baseline method; statistical language modeling; syllable error rate reduction; under-resourced language; Crosslingual acoustic modeling; grapheme-based acoustic modeling; lattice decomposition and combination; speech recognition; under-resourced languages;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2021723
  • Filename
    4895261