DocumentCode :
1872689
Title :
Reducing speech recognition costs: By compressing the input data
Author :
Halavati, Ramin ; Shouraki, Saeed Bagheri
Author_Institution :
Electr. Eng. Dept., Sharif Univ. of Technol., Tehran, Iran
fYear :
2012
fDate :
6-8 Sept. 2012
Firstpage :
102
Lastpage :
107
Abstract :
One of the key constraints of using embedded speech recognition modules is the required computational power. To decrease this requirement, we propose an algorithm that clusters the speech signal before passing it to the recognition units. The algorithm is based on agglomerative clustering and produces a sequence of compressed frames, optimized for recognition. Our experimental results indicate that the proposed method presents a frame rate with average 40 frames per second on medium to large vocabulary isolated word recognition tasks without loss of recognition accuracy which result in up to 60% faster recognition in compare to usual 100 fps fixed frame rate sampling. This value is quite close to the theoretically optimal value of 37.5 frames per second while the best result of former approaches is about 60 frames per second.
Keywords :
encoding; pattern clustering; signal sampling; speech recognition; word processing; embedded speech recognition modules; frame rate sampling; optimal values; optimized compressed frame sequence production; recognition accuracy; speech recognition cost reduction; speech signal agglomerative clustering; vocabulary isolated word recognition tasks; Accuracy; Clustering algorithms; Euclidean distance; Hidden Markov models; Speech; Speech recognition; Vectors; Clustering Methods; Speech Coding; Speech Recognition; Variable Rate Codes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems (IS), 2012 6th IEEE International Conference
Conference_Location :
Sofia
Print_ISBN :
978-1-4673-2276-8
Type :
conf
DOI :
10.1109/IS.2012.6335121
Filename :
6335121
Link To Document :
بازگشت