Title of article :
Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Author/Authors :
Homayounpour, M.M Computer Engineering and IT Department - Amirkabir University of Technology - Tehran, Iran , Asadolahzade Kermanshahi, M Computer Engineering and IT Department - Amirkabir University of Technology - Tehran, Iran
Pages :
11
From page :
137
To page :
147
Abstract :
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. The recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in the DNN-based phoneme recognition systems including training and testing. Most previous research works have attempted to improve training phases such as training algorithms, different types of network, network architecture and feature type. However, in this work, we focus on the test phase, which is related to the generation of phoneme sequence that is also essential to achieve a good phoneme recognition accuracy. Past research works have used Viterbi algorithm on hidden Markov model (HMM) to generate phoneme sequences. We address an important problem associated with this method. In order to deal with the problem of considering geometric distribution of state duration in HMM, we use real duration probability distribution for each phoneme with the aid of hidden semi-Markov model (HSMM). We also represent each phoneme with only one state to simply use phoneme duration information in HSMM. Furthermore, we investigate the performance of a post-processing method that corrects the phoneme sequence obtained from the neural network based on our knowledge about phonemes. The experimental results obtained using the Persian FarsDat corpus show that using the extended Viterbi algorithm on HSMM achieves phoneme recognition accuracy improvements of 2.68% and 0.56% over the conventional methods using Gaussian mixture model-hidden Markov models (GMM-HMMs) and Viterbi on HMM, respectively. The postprocessing method also increases the accuracy compared to before its application.
Keywords :
Persian (Farsi) Language , Hidden Semi-Markov Model , Deep Neural Network , Extended Viterbi Algorithm , Phoneme Duration , Hidden Markov Model , Phoneme Recognition
Journal title :
Astroparticle Physics
Serial Year :
2019
Record number :
2452610
Link To Document :
بازگشت