DocumentCode :
180484
Title :
Multilingual MRASTA features for low-resource keyword search and speech recognition systems
Author :
Tuske, Zoltan ; Nolden, David ; Schluter, Ralf ; Ney, Hermann
Author_Institution :
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
7854
Lastpage :
7858
Abstract :
This paper investigates the application of hierarchical MRASTA bottleneck (BN) features for under-resourced languages within the IARPA Babel project. Through multilingual training of Multilayer Perceptron (MLP) BN features on five languages (Cantonese, Pashto, Tagalog, Turkish, and Vietnamese), we could end up in a single feature stream which is more beneficial to all languages than the unilingual features. In the case of balanced corpus sizes, the multilingual BN features improve the automatic speech recognition (ASR) performance by 3-5% and the keyword search (KWS) by 3-10% relative for both limited (LLP) and full language packs (FLP). Borrowing orders of magnitude more data from non-target FLPs, the recognition error rate is reduced by 8-10%, and the spoken term detection is improved by over 40% relative on Vietnamese and Pashto LLP. Aiming at the fast development of acoustic models, cross-lingual transfer of multilingually ”pretrained” BN features for a new language is also investigated. Without the need of any MLP training on the new language, the ported BN features performed similarly to the unilingual features on FLP and significantly better on LLP. Results also show that a simple fine-tuning step on the new language is enough to achieve comparable KWS and ASR performance to that system where the target language is also involved in the time-consuming multilingual training.
Keywords :
feature extraction; multilayer perceptrons; speech recognition; telecommunication computing; ASR; Cantonese; IARPA Babel project; KWS; Pashto; Tagalog; Turkish; Vietnamese; acoustic models; automatic speech recognition; balanced corpus sizes; full language packs; hierarchical MRASTA bottleneck features; limited language packs; low-resource keyword search; multilayer perceptron; multilingual MRASTA features; recognition error rate; spoken term detection; time-consuming multilingual training; under-resourced languages; Acoustics; Feature extraction; Speech; Speech processing; Speech recognition; Training; ASR; Babel; KWS; MLP; MRASTA; bottleneck; hierarchical; neural network; tandem;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6855129
Filename :
6855129
Link To Document :
بازگشت