Title :
Improving the Robustness of Persian Large Vocabulary Continuous Speech Recognition System for Real Applications
Author :
Veisi, H. ; Sameti, H. ; Babaali, B. ; Hosseinzadeh, Kh. ; Manzuri, M.T.
Author_Institution :
Dept. of Comput. Eng., Sharif Univ. of Technol., Tehran
Abstract :
In this paper vocal track length normalization (VTLN) with adaptation methods, MLLR and MAP were investigated to making robust Persian HMM-based speaker independent large vocabulary continuous speech recognition system. The robustness for speaker and environmental noises were achieved in real world applications in this system. In VTLN method, a line-search based approach was used in order to find speakers relative warping factors. The factors were applied to signal´s spectrum to normalize the variations in vocal track length between speakers. In the MLLR method, Gaussian mean and variance transformations in full adaptation were experienced. In this method regression tree-based adaptation in supervised fashion was used. Also the standard MAP was experienced as an adaptation method for compensate speaker and environment variations. Combinations of these approaches were evaluated on 4 different noisy tasks. We could achieve the significant improvement in the recognition performance in noisy environments as it makes our system operational in real applications
Keywords :
Gaussian processes; hidden Markov models; maximum likelihood estimation; natural languages; regression analysis; speech recognition; trees (mathematics); vocabulary; Gaussian mean transformation; Guassian variance transformation; Persian hidden Markov model; Persian large vocabulary continuous speech recognition system; adaptation methods; line-search based approach; maximum a posteriori; maximum likelihood linear regression; regression tree-based adaptation; signals spectrum; speakers relative warping factors; vocal track length normalization; Acoustic noise; Degradation; Frequency estimation; Loudspeakers; Maximum likelihood linear regression; Noise robustness; Regression tree analysis; Speech recognition; Vocabulary; Working environment noise;
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
DOI :
10.1109/ICTTA.2006.1684565