DocumentCode :
735033
Title :
Reducing morpho-phonetic confusion in sub-word based Uyghur ASR
Author :
Ablimit, Mijit ; Hamdulla, Askar ; Pattar, Akbar
Author_Institution :
Postdoctoral Res. Station of Comput. Sci. & Technol., Xinjiang Univ., Urumqi, China
fYear :
2015
fDate :
12-15 July 2015
Firstpage :
348
Lastpage :
352
Abstract :
Sub-word units like morphemes are selected as the lexicon for highly inflectional languages, as they can provide better coverage and a smaller vocabulary size. However, short units shrink the context of statistical models, prone to morpho-phonetic changes, and not always outperform the word based model. When sequence of units are merged or split, unit boundaries are phonetically harmonized in the speech which reflects as the morpho-phonetic changes in the text. This paper investigates morpho-phonetic confusions in the sub-word segmentation of Uyghur text, and phonetic reasons which affect automatic speech recognition (ASR) accuracy. An optimal lexicon set is obtained by comparing ASR results of different layers of lexica, which avoids phonetic confusions in the frequently misrecognized morpheme sequences. This optimal lexicon, which is obtained totally from a HMM based acoustic model, outperformed all the baseline linguistic units. And when all these units are directly incorporated a deep neural network (DNN) based acoustic model, without changing the training corpora and language models, the optimal lexicon not only drastically improved the ASR accuracy but also outperformed other units as a proof of the generality of our approach. Experimental results demonstrate that the optimal lexicon obtained by reducing morpho-phonetic confusions exhibits better ASR accuracy and robustness.
Keywords :
hidden Markov models; natural language processing; neural nets; speech processing; speech recognition; ASR accuracy; DNN based acoustic model; HMM based acoustic model; Uyghur ASR; Uyghur text; automatic speech recognition; baseline linguistic unit; deep neural network; inflectional language; language model; morpheme; morpho-phonetic change; morpho-phonetic confusion; optimal lexicon set; phonetic reason; statistical model; sub-word segmentation; training corpora; vocabulary size; Accuracy; Acoustics; Hidden Markov models; Speech; Surface morphology; Training; Vocabulary; ASR; DNN; Uyghur; morphology; phonetic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
Conference_Location :
Chengdu
Type :
conf
DOI :
10.1109/ChinaSIP.2015.7230422
Filename :
7230422
Link To Document :
بازگشت