DocumentCode
735033
Title
Reducing morpho-phonetic confusion in sub-word based Uyghur ASR
Author
Ablimit, Mijit ; Hamdulla, Askar ; Pattar, Akbar
Author_Institution
Postdoctoral Res. Station of Comput. Sci. & Technol., Xinjiang Univ., Urumqi, China
fYear
2015
fDate
12-15 July 2015
Firstpage
348
Lastpage
352
Abstract
Sub-word units like morphemes are selected as the lexicon for highly inflectional languages, as they can provide better coverage and a smaller vocabulary size. However, short units shrink the context of statistical models, prone to morpho-phonetic changes, and not always outperform the word based model. When sequence of units are merged or split, unit boundaries are phonetically harmonized in the speech which reflects as the morpho-phonetic changes in the text. This paper investigates morpho-phonetic confusions in the sub-word segmentation of Uyghur text, and phonetic reasons which affect automatic speech recognition (ASR) accuracy. An optimal lexicon set is obtained by comparing ASR results of different layers of lexica, which avoids phonetic confusions in the frequently misrecognized morpheme sequences. This optimal lexicon, which is obtained totally from a HMM based acoustic model, outperformed all the baseline linguistic units. And when all these units are directly incorporated a deep neural network (DNN) based acoustic model, without changing the training corpora and language models, the optimal lexicon not only drastically improved the ASR accuracy but also outperformed other units as a proof of the generality of our approach. Experimental results demonstrate that the optimal lexicon obtained by reducing morpho-phonetic confusions exhibits better ASR accuracy and robustness.
Keywords
hidden Markov models; natural language processing; neural nets; speech processing; speech recognition; ASR accuracy; DNN based acoustic model; HMM based acoustic model; Uyghur ASR; Uyghur text; automatic speech recognition; baseline linguistic unit; deep neural network; inflectional language; language model; morpheme; morpho-phonetic change; morpho-phonetic confusion; optimal lexicon set; phonetic reason; statistical model; sub-word segmentation; training corpora; vocabulary size; Accuracy; Acoustics; Hidden Markov models; Speech; Surface morphology; Training; Vocabulary; ASR; DNN; Uyghur; morphology; phonetic;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
Conference_Location
Chengdu
Type
conf
DOI
10.1109/ChinaSIP.2015.7230422
Filename
7230422
Link To Document