مرکز منطقه ای اطلاع رساني علوم و فناوري - Multilingual MRASTA features for low-resource keyword search and speech recognition systems

DocumentCode :

180484

Title :

Multilingual MRASTA features for low-resource keyword search and speech recognition systems

Author :

Tuske, Zoltan ; Nolden, David ; Schluter, Ralf ; Ney, Hermann

Author_Institution :

Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

7854

Lastpage :

7858

Abstract :

This paper investigates the application of hierarchical MRASTA bottleneck (BN) features for under-resourced languages within the IARPA Babel project. Through multilingual training of Multilayer Perceptron (MLP) BN features on five languages (Cantonese, Pashto, Tagalog, Turkish, and Vietnamese), we could end up in a single feature stream which is more beneficial to all languages than the unilingual features. In the case of balanced corpus sizes, the multilingual BN features improve the automatic speech recognition (ASR) performance by 3-5% and the keyword search (KWS) by 3-10% relative for both limited (LLP) and full language packs (FLP). Borrowing orders of magnitude more data from non-target FLPs, the recognition error rate is reduced by 8-10%, and the spoken term detection is improved by over 40% relative on Vietnamese and Pashto LLP. Aiming at the fast development of acoustic models, cross-lingual transfer of multilingually ”pretrained” BN features for a new language is also investigated. Without the need of any MLP training on the new language, the ported BN features performed similarly to the unilingual features on FLP and significantly better on LLP. Results also show that a simple fine-tuning step on the new language is enough to achieve comparable KWS and ASR performance to that system where the target language is also involved in the time-consuming multilingual training.

Keywords :

feature extraction; multilayer perceptrons; speech recognition; telecommunication computing; ASR; Cantonese; IARPA Babel project; KWS; Pashto; Tagalog; Turkish; Vietnamese; acoustic models; automatic speech recognition; balanced corpus sizes; full language packs; hierarchical MRASTA bottleneck features; limited language packs; low-resource keyword search; multilayer perceptron; multilingual MRASTA features; recognition error rate; spoken term detection; time-consuming multilingual training; under-resourced languages; Acoustics; Feature extraction; Speech; Speech processing; Speech recognition; Training; ASR; Babel; KWS; MLP; MRASTA; bottleneck; hierarchical; neural network; tandem;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6855129

Filename :

6855129

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=180484