• DocumentCode
    1954721
  • Title

    A parallel implementation of Viterbi training for acoustic models using graphics processing units

  • Author

    Buthpitiya, Senaka ; Lane, Ian ; Chong, Jike

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2012
  • fDate
    13-14 May 2012
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are trained on many thousands of hours of speech data and even with large clusters of machines the entire training process can take many weeks. To overcome this development bottleneck, we propose a parallel implementation of Viterbi training optimized for training Hidden-Markov-Model (HMM)-based acoustic models using highly parallel graphics processing units (GPUs). In this paper, we introduce Viterbi training, illustrate its application concurrency characteristics, data working set sizes, and describe the optimizations required for effective throughput on GPU processors. We demonstrate that the acoustic model training process is well-suited for GPUs. Using a single NVIDIA GTX580 GPU our proposed approach is shown to be 94.8× faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000 hours of speech data in under 7 hours. Moreover, we show that our implementation on a two-GPU system can perform 3.3× faster than a standard parallel reference implementation on a high-end 32-core Xeon server at 1/15th the cost. Our GPU-based training platform empowers research groups to rapidly evaluate new ideas and build accurate and robust acoustic models on very large training corpora at nominal cost.
  • Keywords
    acoustic signal processing; computer based training; graphics processing units; hidden Markov models; speech recognition; GPU-based training platform; HMM; NVIDIA GTX580 GPU; Viterbi training; acoustic model training process; application concurrency characteristics; data working set sizes; effective throughput; graphics processing units; hidden-Markov-model-based acoustic models; parallel implementation; speech recognition systems; Acoustics; Computational modeling; Concurrent computing; Graphics processing unit; Hidden Markov models; Instruction sets; Training; Acoustic Model Training; Continuous Speech Recognition; Graphics Processing Unit;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Parallel Computing (InPar), 2012
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    978-1-4673-2632-2
  • Electronic_ISBN
    978-1-4673-2631-5
  • Type

    conf

  • DOI
    10.1109/InPar.2012.6339590
  • Filename
    6339590