• DocumentCode
    70615
  • Title

    A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory

  • Author

    Bapat, Ojas A. ; Franzon, Paul D. ; Fastow, Richard M.

  • Author_Institution
    Spansion Inc., Sunnyvale, CA, USA
  • Volume
    22
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    2701
  • Lastpage
    2712
  • Abstract
    This paper describes a scalable hardware accelerator for speech recognition, which uses a two pass decoding algorithm with word dependent N-best Viterbi Beam Search. The observation probability calculation (Senone scoring) and first pass of decoding using a Bigram language model is implemented in hardware. The word lattice output from the first pass is used by software for the second pass, with a trigram language model. The proposed design uses a logic-on-memory approach to make use of high bandwidth nor flash memory to improve random read performance for Senone scoring and first pass decoding, both of which are memory intensive operations. The proposed HW/SW co-design achieves an overall speed up of 4.3X over a 2.4-GHz Intel Core 2 Duo processor running the CMU Sphinx speech recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real-time performance.
  • Keywords
    hardware-software codesign; logic circuits; logic design; speech coding; speech recognition; Bigram language model; CMU Sphinx speech recognition software; HW/SW co-design; Intel Core 2 Duo processor; Senone scoring; acoustic models; first pass decoding; flash memory; frequency 2.4 GHz; generic scalable architecture; large acoustic model; large vocabulary speech recognition accelerator; logic-on-memory approach; memory intensive operations; observation probability calculation; power 1.72 W; read performance; scalable hardware accelerator; trigram language model; two pass decoding algorithm; word dependent N-best Viterbi beam search; word dictionaries; word lattice output; Acoustic beams; Acoustics; Decoding; Hardware; Hidden Markov models; Software; Speech recognition; Accelerator; N-best; beam search; embedded; hardware software co-design; logic on memory; multipass decoding; speech recognition; sphinx; sphinx.;
  • fLanguage
    English
  • Journal_Title
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-8210
  • Type

    jour

  • DOI
    10.1109/TVLSI.2013.2296526
  • Filename
    6718087