• DocumentCode
    735033
  • Title

    Reducing morpho-phonetic confusion in sub-word based Uyghur ASR

  • Author

    Ablimit, Mijit ; Hamdulla, Askar ; Pattar, Akbar

  • Author_Institution
    Postdoctoral Res. Station of Comput. Sci. & Technol., Xinjiang Univ., Urumqi, China
  • fYear
    2015
  • fDate
    12-15 July 2015
  • Firstpage
    348
  • Lastpage
    352
  • Abstract
    Sub-word units like morphemes are selected as the lexicon for highly inflectional languages, as they can provide better coverage and a smaller vocabulary size. However, short units shrink the context of statistical models, prone to morpho-phonetic changes, and not always outperform the word based model. When sequence of units are merged or split, unit boundaries are phonetically harmonized in the speech which reflects as the morpho-phonetic changes in the text. This paper investigates morpho-phonetic confusions in the sub-word segmentation of Uyghur text, and phonetic reasons which affect automatic speech recognition (ASR) accuracy. An optimal lexicon set is obtained by comparing ASR results of different layers of lexica, which avoids phonetic confusions in the frequently misrecognized morpheme sequences. This optimal lexicon, which is obtained totally from a HMM based acoustic model, outperformed all the baseline linguistic units. And when all these units are directly incorporated a deep neural network (DNN) based acoustic model, without changing the training corpora and language models, the optimal lexicon not only drastically improved the ASR accuracy but also outperformed other units as a proof of the generality of our approach. Experimental results demonstrate that the optimal lexicon obtained by reducing morpho-phonetic confusions exhibits better ASR accuracy and robustness.
  • Keywords
    hidden Markov models; natural language processing; neural nets; speech processing; speech recognition; ASR accuracy; DNN based acoustic model; HMM based acoustic model; Uyghur ASR; Uyghur text; automatic speech recognition; baseline linguistic unit; deep neural network; inflectional language; language model; morpheme; morpho-phonetic change; morpho-phonetic confusion; optimal lexicon set; phonetic reason; statistical model; sub-word segmentation; training corpora; vocabulary size; Accuracy; Acoustics; Hidden Markov models; Speech; Surface morphology; Training; Vocabulary; ASR; DNN; Uyghur; morphology; phonetic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
  • Conference_Location
    Chengdu
  • Type

    conf

  • DOI
    10.1109/ChinaSIP.2015.7230422
  • Filename
    7230422