• DocumentCode
    3423430
  • Title

    Acoustic and pronunciation model adaptation for context-independent and context-dependent pronunciation variability of non-native speech

  • Author

    Oh, Yoo Rhee ; Kim, Mina ; Kim, Hong Kook

  • Author_Institution
    Dept. of Inf. & Commun., Gwangju Inst. of Sci. & Technol. (GIST), Gwangju
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4281
  • Lastpage
    4284
  • Abstract
    In this paper, we propose an acoustic and pronunciation model adaptation method for context-independent (CI) and context-dependent (CD) pronunciation variability to improve the performance of a non-native automatic speech recognition (ASR) system. The proposed adaptation method is performed in three steps. First, we perform phone recognition to obtain an n-best list of phoneme sequences and derive pronunciation variant rules by using a decision tree. Second, the pronunciation variant rules are decomposed into CI and CD pronunciation variation on the basis of context dependency. That is, some pronunciation variant rules that are dedicated to the specific phoneme sequences is classified into CI pronunciation variation, but others are classified into CD one. It is assumed here that CI and CD pronunciation variabilities are invoked by a different pronunciation space from the mother tongue of a non-native speaker and the coarticulation effects in a context, respectively. Third, the acoustic model adaptation is performed in a state-tying step for the CI pronunciation variability from an indirect data-driven method. In addition, the pronunciation model adaptation is completed by constructing a multiple pronunciation dictionary using the CD pronunciation variability. It is shown from the continuous Korean-English ASR experiments that the proposed method can reduce the average word error rate (WER) by 16.02% when compared with the baseline ASR system that is trained by native speech. Moreover, an ASR system using the proposed method provides average WER reductions of 8.95% and 3.67% when compared to the only acoustic model adaptation and the only pronunciation model adaptation, respectively.
  • Keywords
    decision trees; speech recognition; acoustic model adaptation; context-dependent pronunciation; context-independent pronunciation; decision tree; nonnative automatic speech recognition; nonnative speaker; nonnative speech variability; phone recognition; pronunciation model adaptation; word error rate; Adaptation model; Automatic speech recognition; Context modeling; Databases; Decision trees; Dictionaries; Error analysis; Loudspeakers; Speech analysis; Tongue; Automatic speech recognition; acoustic model adaptation; non-native speech; pronunciation model adaptation; pronunciation variability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518601
  • Filename
    4518601