مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallelized feature extraction and acoustic model training

DocumentCode :

239540

Title :

Parallelized feature extraction and acoustic model training

Author :

Haofeng Kou ; Weijia Shang

Author_Institution :

Santa Clara Univ., Santa Clara, CA, USA

fYear :

2014

fDate :

20-23 Aug. 2014

Firstpage :

503

Lastpage :

508

Abstract :

In this paper, we present our research on the parallelized speech recognition including both Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and Viterbi training for Hidden Markov Model (HMM) based acoustic model [2] on the Graphics Processing Units (GPU). Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models derived from the effectively parsed features. For common languages, state-of-the-art systems are extracted and trained on many thousands of hours of speech data and even with large clusters of machines the entire extracting and training process can take weeks. To overcome this development bottleneck, we not only demonstrate that feature extraction and acoustic model training are suitable for GPUs, but also propose the optimized parallel implementation using highly parallel GPUs by combining the MFCC feature extraction along with Viterbi training for HMM acoustic model, illustrate its application concurrency characteristics, data working set sizes, and describe the optimizations required for effective throughput on GPU processors. We demonstrate that feature extraction and acoustic model training are well suited for GPUs. Using one GTX580 our approach is shown to be overall approximately 95x faster than a sequential CPU implementation at the same accuracy level, enabling feature extraction and acoustic model training to be performed at realtime.

Keywords :

feature extraction; graphics processing units; hidden Markov models; speech recognition; GPU processors; GTX580; HMM based acoustic model; MFCC feature extraction; Mel-frequency cepstral coefficient; Viterbi training; acoustic model training; acoustic models; graphics processing units; hidden Markov model; parallelized feature extraction; parallelized speech recognition; speech recognition systems; Feature extraction; Graphics processing units; Hidden Markov models; Instruction sets; Mel frequency cepstral coefficient; Training; Acoustic Model Training; CUDA; Continuous Speech Recognition; GPU; HMM; MFCC Feature Extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Digital Signal Processing (DSP), 2014 19th International Conference on

Conference_Location :

Hong Kong

Type :

conf

DOI :

10.1109/ICDSP.2014.6900717

Filename :

6900717

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=239540