Title :
Speech recognition for resource deficient languages using frugal speech corpus
Author :
Imran, Ahmed ; Sunil, K.
Author_Institution :
TCS Innovation Labs. - Mumbai, Thane, India
Abstract :
Building speech recognition application for resource deficient languages is a challenge because of the unavailability of a speech corpus. Speech corpus is a central element for training the acoustic models used in a speech recognition engine. Constructing a speech corpus for a language is an expensive, time consuming and laborious process. This paper addresses a mechanism to develop an inexpensive speech corpus, for resource deficient languages Indian English and Hindi, by exploiting existing collections of online speech data to build a frugal speech corpus. For the purpose of demonstration we use online audio news archives to build a frugal speech corpus. We then use this speech corpus to train acoustic models and evaluate the performance of speech recognition on Indian English and Hindi speech.
Keywords :
speech recognition; English languages; Hindi languages; Indian languages; acoustic models; frugal speech corpus; laborious process; online audio; online speech data; resource deficient languages; speech recognition engine; time consuming process; Acoustics; Adaptation models; Atmospheric modeling; Data models; Speech; Speech processing; Speech recognition; acoustic model; resource deficient; speech corpus;
Conference_Titel :
Signal Processing, Communication and Computing (ICSPCC), 2012 IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4673-2192-1
DOI :
10.1109/ICSPCC.2012.6335664