Acoustic modeling for hindi speech recognition in low-resource settings

Author

Dey, Anamika ; Weibin Zhang ; Fung, Pascale

Author_Institution

Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China

fYear

2014

fDate

7-9 July 2014

Firstpage

891

Lastpage

894

Abstract

We propose an approach for acoustic modeling of Hindi speech by borrowing from English data, for the purpose of Hindi LVCSR. Hindi, like many Indian languages, has a significant speaker base but there have not been a lot of resources to obtain large amounts of transcribed Hindi data for LVCSR. We compare a baseline Gaussian model-sharing approach with DNN training. A widely used data-borrowing method with DNN is to firstly train a DNN with English, for which a large amount of training data is available; then the whole DNN, except the last layer, is fine-tuned by using the target Hindi data. We propose to do phonetic mapping between Hindi and English in the first stage, training Hindi acoustic models by sharing data between Hindi-English phone pairs in the second stage, and finally fine-tuning the acoustic model by using the Hindi data. We evaluate and compare these approaches with experiments using 1 hour of transcribed Hindi data and 15 hours of Wall Street Journal English data. Experiments show that the proposed method significantly outperforms conventional baseline models in a low-resource setting for phone recognition tasks.

Keywords

Gaussian processes; acoustic signal processing; feedforward neural nets; hidden Markov models; learning (artificial intelligence); natural language processing; speaker recognition; speech processing; DNN training; GMM; Gaussian mixture models; HMM; Hindi LVCSR; Hindi speech recognition; Hindi-English phone pairs; Indian languages; Wall Street Journal English data; acoustic modeling; baseline Gaussian model-sharing approach; data sharing; deep neural network; feed-forward network; hidden Markov models; low-resource settings; phone recognition tasks; phonetic mapping; Acoustics; Data models; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Hindi LVSCR; data borrowing; low resource; phone mapping;

fLanguage

English

Publisher

ieee

Conference_Titel

Audio, Language and Image Processing (ICALIP), 2014 International Conference on

Conference_Location

Shanghai

Print_ISBN

978-1-4799-3902-2

Type

conf

DOI

10.1109/ICALIP.2014.7009923

Filename

7009923