مرکز منطقه ای اطلاع رساني علوم و فناوري - Compute & memory optimizations for high-quality speech recognition on low-end GPU processors

DocumentCode :

3331326

Title :

Compute & memory optimizations for high-quality speech recognition on low-end GPU processors

Author :

Gupta, Kshitij ; Owens, John D.

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA

fYear :

2011

fDate :

18-21 Dec. 2011

Firstpage :

Lastpage :

Abstract :

Gaussian Mixture Model (GMM) computations in modern Automatic Speech Recognition systems are known to dominate the total processing time, and are both memory bandwidth and compute intensive. Graphics processors (GPU), are well suited for applications exhibiting data- and thread-level parallelism, as that exhibited by GMM score computations. By exploiting temporal locality over successive frames of speech, we have previously presented a theoretical framework for modifying the traditional speech processing pipeline and obtaining significant savings in compute and memory bandwidth requirements, especially on resource-constrained devices like those found in mobile devices. In this paper we discuss in detail our implementation for two of the three techniques we previously proposed, and suggest a set of guidelines of which technique is suitable for a given condition. For a medium-vocabulary, dictation task consisting of 5k words, we are able to reduce memory bandwidth by 80% for a 20% overhead in compute without loss in accuracy by applying the first technique, and memory and compute savings of 90% and 35% respectively for a 15% degradation in accuracy by using the second technique. We are able to achieve a 4× speed-up (to 6 times real-time performance), over the baseline on a low-end 9400M Nvidia GPU.

Keywords :

Gaussian processes; graphics processing units; speech recognition; GMM score computations; Gaussian mixture model computation; compute intensive; data parallelism; dictation task; high-quality automatic speech recognition system; low-end GPU processors; medium vocabulary; memory bandwidth; memory optimization; resource-constrained devices; speech processing pipeline; successive speech frames; temporal locality; thread-level parallelism; Acoustics; Bandwidth; Computational modeling; Graphics processing unit; Hidden Markov models; Speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing (HiPC), 2011 18th International Conference on

Conference_Location :

Bangalore

Print_ISBN :

978-1-4577-1951-6

Electronic_ISBN :

978-1-4577-1949-3

Type :

conf

DOI :

10.1109/HiPC.2011.6152741

Filename :

6152741

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3331326