• DocumentCode
    3528674
  • Title

    Cross-lingual speech recognition under runtime resource constraints

  • Author

    Yu, Dong ; Deng, Li ; Liu, Peng ; Wu, Jian ; Gong, Yifan ; Acero, Alex

  • Author_Institution
    Microsoft Corp., Redmond, WA
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4193
  • Lastpage
    4196
  • Abstract
    This paper proposes and compares four cross-lingual and bilingual automatic speech recognition techniques under the constraint that only the acoustic model (AM) of the native language is used at runtime. The first three techniques fall into the category of lexicon conversion where each phoneme sequence (PHS) in the foreign language (FL) lexicon is mapped into the native language (NL) phoneme sequence. The first technique determines the PHS mapping through the international phonetic alphabet (IPA) features; The second and third techniques are data-driven. They determine the mapping by converting the PHS into corresponding context-independent and context-dependent hidden Markov models (HMMs) respectively and searching for the NL PHS with the least Kullback-Leibler divergence (KLD) between the HMMs. The fourth technique falls into the category of AM merging where the FL´s AM is merged into the NL´s AM by mapping each senone in the FL´s AM to the senone in the NL´s AM with the minimum KLD. We discuss the strengths and limitations of each technique developed, report empirical evaluation results on recognizing English utterances with a Korean recognizer, and demonstrate the high correlation between the average KLD and the word error rate (WER). The results show that the AM merging technique performs the best, achieving 60% relative WER reduction over the IPA-based technique.
  • Keywords
    natural language processing; speech recognition; acoustic model; bilingual automatic speech recognition techniques; context-dependent hidden Markov models; cross-lingual speech recognition; foreign language lexicon; international phonetic alphabet; lexicon conversion; native language; phoneme sequence; runtime resource constraints; word error rate; Automatic control; Automatic speech recognition; Automobiles; Error analysis; Hidden Markov models; Keyboards; Merging; Natural languages; Runtime; Speech recognition; Cross-lingual speech recognition; Kullback-Leibler divergence; lexicon conversion; resource constraint; senone mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960553
  • Filename
    4960553