DocumentCode
1798846
Title
Acoustic modeling for hindi speech recognition in low-resource settings
Author
Dey, Anamika ; Weibin Zhang ; Fung, Pascale
Author_Institution
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
fYear
2014
fDate
7-9 July 2014
Firstpage
891
Lastpage
894
Abstract
We propose an approach for acoustic modeling of Hindi speech by borrowing from English data, for the purpose of Hindi LVCSR. Hindi, like many Indian languages, has a significant speaker base but there have not been a lot of resources to obtain large amounts of transcribed Hindi data for LVCSR. We compare a baseline Gaussian model-sharing approach with DNN training. A widely used data-borrowing method with DNN is to firstly train a DNN with English, for which a large amount of training data is available; then the whole DNN, except the last layer, is fine-tuned by using the target Hindi data. We propose to do phonetic mapping between Hindi and English in the first stage, training Hindi acoustic models by sharing data between Hindi-English phone pairs in the second stage, and finally fine-tuning the acoustic model by using the Hindi data. We evaluate and compare these approaches with experiments using 1 hour of transcribed Hindi data and 15 hours of Wall Street Journal English data. Experiments show that the proposed method significantly outperforms conventional baseline models in a low-resource setting for phone recognition tasks.
Keywords
Gaussian processes; acoustic signal processing; feedforward neural nets; hidden Markov models; learning (artificial intelligence); natural language processing; speaker recognition; speech processing; DNN training; GMM; Gaussian mixture models; HMM; Hindi LVCSR; Hindi speech recognition; Hindi-English phone pairs; Indian languages; Wall Street Journal English data; acoustic modeling; baseline Gaussian model-sharing approach; data sharing; deep neural network; feed-forward network; hidden Markov models; low-resource settings; phone recognition tasks; phonetic mapping; Acoustics; Data models; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Hindi LVSCR; data borrowing; low resource; phone mapping;
fLanguage
English
Publisher
ieee
Conference_Titel
Audio, Language and Image Processing (ICALIP), 2014 International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4799-3902-2
Type
conf
DOI
10.1109/ICALIP.2014.7009923
Filename
7009923
Link To Document