DocumentCode
3539804
Title
Efficient use of training data for sinhala speech recognition using active learning
Author
Nadungodage, Thilini ; Weerasinghe, Ruvan ; Niranjan, Mahesan
Author_Institution
Sch. of Comput., Language Technol. Res. Lab., Univ. of Colombo, Colombo, Sri Lanka
fYear
2013
fDate
11-15 Dec. 2013
Firstpage
149
Lastpage
153
Abstract
Automatic Speech Recognition is an area which requires a large amount of training data. Collecting such quantities of data involves significant time and cost owing to the tedious nature of collecting speech recordings and manual nature of transcribing it. For a low resourced language such as Sinhala, collecting a sufficient data set is a major problem. To address this issue we used the Active Learning technique from the Machine Learning paradigm which is applied to many tasks such as information retrieval. Our experiment using a simple Sinhala speech corpus shows that through the use of Active Learning, the amount of utterances that need to be transcribed can be reduced by some 42% to achieve the same accuracy as using the whole data set without such a strategy. This suggests that Active Learning techniques can be successfully applied to make optimal use of scarce resources for speech recognition for new languages.
Keywords
information retrieval; learning (artificial intelligence); speech recognition; Sinhala speech corpus; Sinhala speech recognition; active learning technique; automatic speech recognition; information retrieval; machine learning; training data; Accuracy; Acoustics; Computational modeling; Speech; Speech recognition; Training; Vocabulary; ASR; Active Learning; Automatic Speech Recognition; Confidence scoring; Information extraction; Low Resourced Languages; NLP; Natural Language Processing; Sinhala; Word posterior probabilities;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in ICT for Emerging Regions (ICTer), 2013 International Conference on
Conference_Location
Colombo
Print_ISBN
978-1-4799-1275-9
Type
conf
DOI
10.1109/ICTer.2013.6761170
Filename
6761170
Link To Document