• DocumentCode
    1135635
  • Title

    Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules

  • Author

    Hwang, Mei-Yuh ; Peng, Gang ; Ostendorf, Mari ; Wang, Wen ; Faria, Arlo ; Heidel, Aaron

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • Volume
    17
  • Issue
    7
  • fYear
    2009
  • Firstpage
    1253
  • Lastpage
    1262
  • Abstract
    We describe a system for highly accurate large-vocabulary Mandarin speech recognition. The prevailing hidden Markov model based technologies are essentially language independent and constitute the backbone of our system. These include minimum-phone-error discriminative training and maximum-likelihood linear regression adaptation, among others. Additionally, careful considerations are taken into account for Mandarin-specific issues including lexical word segmentation, tone modeling, phone set design, and automatic acoustic segmentation. Our system comprises two sets of acoustic models for the purposes of cross adaptation. The systems are designed to be complementary in terms of errors but with similar overall accuracy by using different phone sets and different combinations of discriminative learning. The outputs of the two subsystems are then rescored by an adapted n-gram language model. Final confusion network combination yielded 9.1% character error rate on the DARPA GALE 2007 official evaluation, the best Mandarin recognition system in that year.
  • Keywords
    hidden Markov models; maximum likelihood estimation; regression analysis; speech recognition; adapted n-gram language model; automatic acoustic segmentation; discriminative learning; hidden Markov model; highly accurate large-vocabulary Mandarin speech recognition; language-dependent modules; language-independent technologies; lexical word segmentation; maximum likelihood linear regression adaptation; minimum-phone-error discriminative training; Automatic speech recognition; Character recognition; Error analysis; Feature extraction; Hidden Markov models; Linear regression; Multilayer perceptrons; Natural languages; Speech recognition; Spine; Confusion network combination; GALE; Mandarin automatic speech recognition (ASR); Mandarin pronunciations; Tandem MLP; cross adaptation; discriminative training; hidden activation temporal patterns (HATs); multilayer perceptron (MLP);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2014263
  • Filename
    5165110