DocumentCode :
590871
Title :
Speaking rate dependent multiple acoustic models using continuous frame rate normalization
Author :
Sung Min Ban ; Hyung Soon Kim
Author_Institution :
Pusan Nat. Univ., Busan, South Korea
fYear :
2012
fDate :
3-6 Dec. 2012
Firstpage :
1
Lastpage :
4
Abstract :
This paper proposes a method using speaking rate dependent multiple acoustic models for speech recognition. In this method, multiple acoustic models with various speaking rates are generated. Among them, the optimal acoustic model relevant to the speaking rate of test data is selected and used in recognition. To simulate the various speaking rates for the multiple acoustic models, we use the variable frame shift size considering the speaking rate of each utterance instead of applying a flat frame shift size to all training utterances. The continuous frame rate normalization (CFRN) is applied to each of training utterances to control the frame shift size. Experimental results show that the proposed method outperforms both the baseline and the conventional CFRN on test utterances.
Keywords :
speech recognition; CFRN; continuous frame rate normalization; speaking rate dependent multiple acoustic models; speech recognition; test utterances; training utterances; variable frame shift size; Acoustics; Data models; Hidden Markov models; Speech; Speech recognition; Training; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location :
Hollywood, CA
Print_ISBN :
978-1-4673-4863-8
Type :
conf
Filename :
6412018
Link To Document :
بازگشت