Reducing morpho-phonetic confusion in sub-word based Uyghur ASR

Author

Ablimit, Mijit ; Hamdulla, Askar ; Pattar, Akbar

Author_Institution

Postdoctoral Res. Station of Comput. Sci. & Technol., Xinjiang Univ., Urumqi, China

fYear

2015

fDate

12-15 July 2015

Firstpage

348

Lastpage

352

Abstract

Sub-word units like morphemes are selected as the lexicon for highly inflectional languages, as they can provide better coverage and a smaller vocabulary size. However, short units shrink the context of statistical models, prone to morpho-phonetic changes, and not always outperform the word based model. When sequence of units are merged or split, unit boundaries are phonetically harmonized in the speech which reflects as the morpho-phonetic changes in the text. This paper investigates morpho-phonetic confusions in the sub-word segmentation of Uyghur text, and phonetic reasons which affect automatic speech recognition (ASR) accuracy. An optimal lexicon set is obtained by comparing ASR results of different layers of lexica, which avoids phonetic confusions in the frequently misrecognized morpheme sequences. This optimal lexicon, which is obtained totally from a HMM based acoustic model, outperformed all the baseline linguistic units. And when all these units are directly incorporated a deep neural network (DNN) based acoustic model, without changing the training corpora and language models, the optimal lexicon not only drastically improved the ASR accuracy but also outperformed other units as a proof of the generality of our approach. Experimental results demonstrate that the optimal lexicon obtained by reducing morpho-phonetic confusions exhibits better ASR accuracy and robustness.

Keywords

hidden Markov models; natural language processing; neural nets; speech processing; speech recognition; ASR accuracy; DNN based acoustic model; HMM based acoustic model; Uyghur ASR; Uyghur text; automatic speech recognition; baseline linguistic unit; deep neural network; inflectional language; language model; morpheme; morpho-phonetic change; morpho-phonetic confusion; optimal lexicon set; phonetic reason; statistical model; sub-word segmentation; training corpora; vocabulary size; Accuracy; Acoustics; Hidden Markov models; Speech; Surface morphology; Training; Vocabulary; ASR; DNN; Uyghur; morphology; phonetic;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on

Conference_Location

Chengdu

Type

conf

DOI

10.1109/ChinaSIP.2015.7230422

Filename

7230422