• DocumentCode
    24653
  • Title

    Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition

  • Author

    Dongpeng Chen ; Mak, Brian Kan-Wing

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
  • Volume
    23
  • Issue
    7
  • fYear
    2015
  • fDate
    Jul-15
  • Firstpage
    1172
  • Lastpage
    1183
  • Abstract
    We propose a multitask learning (MTL) approach to improve low-resource automatic speech recognition using deep neural networks (DNNs) without requiring additional language resources. We first demonstrate that the performance of the phone models of a single low-resource language can be improved by training its grapheme models in parallel under the MTL framework. If multiple low-resource languages are trained together, we investigate learning a set of universal phones (UPS) as an additional task again in the MTL framework to improve the performance of the phone models of all the involved languages. In both cases, the heuristic guideline is to select a task that may exploit extra information from the training data of the primary task(s). In the first method, the extra information is the phone-to-grapheme mappings, whereas in the second method, the UPS helps to implicitly map the phones of the multiple languages among each other. In a series of experiments using three low-resource South African languages in the Lwazi corpus, the proposed MTL methods obtain significant word recognition gains when compared with single-task learning (STL) of the corresponding DNNs or ROVER that combines results from several STL-trained DNNs.
  • Keywords
    learning (artificial intelligence); natural language processing; neural nets; speech recognition; DNN; Lwazi corpus; MTL approach; South African languages; deep neural networks; low-resource speech recognition; multitask learning approach; phone-to-grapheme mappings; single low-resource language; universal phones; Acoustics; Data models; Hidden Markov models; Neural networks; Speech; Training; Uninterruptible power systems; Deep neural network (DNN); low-resource speech recognition; multitask learning; universal grapheme set; universal phone set;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2422573
  • Filename
    7084614