مرکز منطقه ای اطلاع رساني علوم و فناوري - Accelerating the Training of HTK on GPU with CUDA

DocumentCode :

2991485

Title :

Accelerating the Training of HTK on GPU with CUDA

Author :

Du, Zhihui ; Li, Xiangyu ; Wu, Ji

Author_Institution :

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear :

2012

fDate :

21-25 May 2012

Firstpage :

1907

Lastpage :

1914

Abstract :

The training procedure of Hidden Markov Model (HMM) based Speech Recognition is often very time consuming because of its high computational complexity. The new parallel hardware like GPU can provide multi-thread processing and very high floating-point capability. We take advantage of GPU to accelerate a popular HMM-based Speech Recognition package - HTK. Based on the sequential code of HTK, we design the "paraTraining", a parallel training model in HTK and develop different optimization methods to improve the performance of HTK on GPU which include unrolling the nested loops and using "reduction add" which can maximize the number of threads per block, using warp mechanism of GPU to reduce synchronizing latency, building different indices of threads to address data efficiently. Experimental results show that about 20+ speedup can be achieved without loss in accuracy. We also discuss the implementation of our method on multi-GPU and got around two times speedup compared with on single-GPU.

Keywords :

computational complexity; floating point arithmetic; graphics processing units; hidden Markov models; multi-threading; optimisation; parallel architectures; speech recognition; training; CUDA; GPU warp mechanism; HMM-based speech recognition training; HTK Training; computational complexity; floating-point capability; graphics processing units; hidden Markov model; multithread processing; nested loops; optimization methods; paraTraining design; parallel hardware; parallel training model; performance improvement; reduction add; sequential code; synchronizing latency reduction; thread indices; warp mechanism; Computational modeling; Graphics processing unit; Hidden Markov models; Instruction sets; Speech recognition; Training; Vectors; CUDA; Data Parallel Computing; GPU computin; Speech Recognition; Stream Processor;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0974-5

Type :

conf

DOI :

10.1109/IPDPSW.2012.235

Filename :

6270395

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2991485