Title :
Parallelized feature extraction and acoustic model training
Author :
Haofeng Kou ; Weijia Shang
Author_Institution :
Santa Clara Univ., Santa Clara, CA, USA
Abstract :
In this paper, we present our research on the parallelized speech recognition including both Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and Viterbi training for Hidden Markov Model (HMM) based acoustic model [2] on the Graphics Processing Units (GPU). Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models derived from the effectively parsed features. For common languages, state-of-the-art systems are extracted and trained on many thousands of hours of speech data and even with large clusters of machines the entire extracting and training process can take weeks. To overcome this development bottleneck, we not only demonstrate that feature extraction and acoustic model training are suitable for GPUs, but also propose the optimized parallel implementation using highly parallel GPUs by combining the MFCC feature extraction along with Viterbi training for HMM acoustic model, illustrate its application concurrency characteristics, data working set sizes, and describe the optimizations required for effective throughput on GPU processors. We demonstrate that feature extraction and acoustic model training are well suited for GPUs. Using one GTX580 our approach is shown to be overall approximately 95x faster than a sequential CPU implementation at the same accuracy level, enabling feature extraction and acoustic model training to be performed at realtime.
Keywords :
feature extraction; graphics processing units; hidden Markov models; speech recognition; GPU processors; GTX580; HMM based acoustic model; MFCC feature extraction; Mel-frequency cepstral coefficient; Viterbi training; acoustic model training; acoustic models; graphics processing units; hidden Markov model; parallelized feature extraction; parallelized speech recognition; speech recognition systems; Feature extraction; Graphics processing units; Hidden Markov models; Instruction sets; Mel frequency cepstral coefficient; Training; Acoustic Model Training; CUDA; Continuous Speech Recognition; GPU; HMM; MFCC Feature Extraction;
Conference_Titel :
Digital Signal Processing (DSP), 2014 19th International Conference on
Conference_Location :
Hong Kong
DOI :
10.1109/ICDSP.2014.6900717