• DocumentCode
    180484
  • Title

    Multilingual MRASTA features for low-resource keyword search and speech recognition systems

  • Author

    Tuske, Zoltan ; Nolden, David ; Schluter, Ralf ; Ney, Hermann

  • Author_Institution
    Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    7854
  • Lastpage
    7858
  • Abstract
    This paper investigates the application of hierarchical MRASTA bottleneck (BN) features for under-resourced languages within the IARPA Babel project. Through multilingual training of Multilayer Perceptron (MLP) BN features on five languages (Cantonese, Pashto, Tagalog, Turkish, and Vietnamese), we could end up in a single feature stream which is more beneficial to all languages than the unilingual features. In the case of balanced corpus sizes, the multilingual BN features improve the automatic speech recognition (ASR) performance by 3-5% and the keyword search (KWS) by 3-10% relative for both limited (LLP) and full language packs (FLP). Borrowing orders of magnitude more data from non-target FLPs, the recognition error rate is reduced by 8-10%, and the spoken term detection is improved by over 40% relative on Vietnamese and Pashto LLP. Aiming at the fast development of acoustic models, cross-lingual transfer of multilingually ”pretrained” BN features for a new language is also investigated. Without the need of any MLP training on the new language, the ported BN features performed similarly to the unilingual features on FLP and significantly better on LLP. Results also show that a simple fine-tuning step on the new language is enough to achieve comparable KWS and ASR performance to that system where the target language is also involved in the time-consuming multilingual training.
  • Keywords
    feature extraction; multilayer perceptrons; speech recognition; telecommunication computing; ASR; Cantonese; IARPA Babel project; KWS; Pashto; Tagalog; Turkish; Vietnamese; acoustic models; automatic speech recognition; balanced corpus sizes; full language packs; hierarchical MRASTA bottleneck features; limited language packs; low-resource keyword search; multilayer perceptron; multilingual MRASTA features; recognition error rate; spoken term detection; time-consuming multilingual training; under-resourced languages; Acoustics; Feature extraction; Speech; Speech processing; Speech recognition; Training; ASR; Babel; KWS; MLP; MRASTA; bottleneck; hierarchical; neural network; tandem;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6855129
  • Filename
    6855129