DocumentCode :
3331326
Title :
Compute & memory optimizations for high-quality speech recognition on low-end GPU processors
Author :
Gupta, Kshitij ; Owens, John D.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
fYear :
2011
fDate :
18-21 Dec. 2011
Firstpage :
1
Lastpage :
10
Abstract :
Gaussian Mixture Model (GMM) computations in modern Automatic Speech Recognition systems are known to dominate the total processing time, and are both memory bandwidth and compute intensive. Graphics processors (GPU), are well suited for applications exhibiting data- and thread-level parallelism, as that exhibited by GMM score computations. By exploiting temporal locality over successive frames of speech, we have previously presented a theoretical framework for modifying the traditional speech processing pipeline and obtaining significant savings in compute and memory bandwidth requirements, especially on resource-constrained devices like those found in mobile devices. In this paper we discuss in detail our implementation for two of the three techniques we previously proposed, and suggest a set of guidelines of which technique is suitable for a given condition. For a medium-vocabulary, dictation task consisting of 5k words, we are able to reduce memory bandwidth by 80% for a 20% overhead in compute without loss in accuracy by applying the first technique, and memory and compute savings of 90% and 35% respectively for a 15% degradation in accuracy by using the second technique. We are able to achieve a 4× speed-up (to 6 times real-time performance), over the baseline on a low-end 9400M Nvidia GPU.
Keywords :
Gaussian processes; graphics processing units; speech recognition; GMM score computations; Gaussian mixture model computation; compute intensive; data parallelism; dictation task; high-quality automatic speech recognition system; low-end GPU processors; medium vocabulary; memory bandwidth; memory optimization; resource-constrained devices; speech processing pipeline; successive speech frames; temporal locality; thread-level parallelism; Acoustics; Bandwidth; Computational modeling; Graphics processing unit; Hidden Markov models; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing (HiPC), 2011 18th International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4577-1951-6
Electronic_ISBN :
978-1-4577-1949-3
Type :
conf
DOI :
10.1109/HiPC.2011.6152741
Filename :
6152741
Link To Document :
بازگشت