• DocumentCode
    3539804
  • Title

    Efficient use of training data for sinhala speech recognition using active learning

  • Author

    Nadungodage, Thilini ; Weerasinghe, Ruvan ; Niranjan, Mahesan

  • Author_Institution
    Sch. of Comput., Language Technol. Res. Lab., Univ. of Colombo, Colombo, Sri Lanka
  • fYear
    2013
  • fDate
    11-15 Dec. 2013
  • Firstpage
    149
  • Lastpage
    153
  • Abstract
    Automatic Speech Recognition is an area which requires a large amount of training data. Collecting such quantities of data involves significant time and cost owing to the tedious nature of collecting speech recordings and manual nature of transcribing it. For a low resourced language such as Sinhala, collecting a sufficient data set is a major problem. To address this issue we used the Active Learning technique from the Machine Learning paradigm which is applied to many tasks such as information retrieval. Our experiment using a simple Sinhala speech corpus shows that through the use of Active Learning, the amount of utterances that need to be transcribed can be reduced by some 42% to achieve the same accuracy as using the whole data set without such a strategy. This suggests that Active Learning techniques can be successfully applied to make optimal use of scarce resources for speech recognition for new languages.
  • Keywords
    information retrieval; learning (artificial intelligence); speech recognition; Sinhala speech corpus; Sinhala speech recognition; active learning technique; automatic speech recognition; information retrieval; machine learning; training data; Accuracy; Acoustics; Computational modeling; Speech; Speech recognition; Training; Vocabulary; ASR; Active Learning; Automatic Speech Recognition; Confidence scoring; Information extraction; Low Resourced Languages; NLP; Natural Language Processing; Sinhala; Word posterior probabilities;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in ICT for Emerging Regions (ICTer), 2013 International Conference on
  • Conference_Location
    Colombo
  • Print_ISBN
    978-1-4799-1275-9
  • Type

    conf

  • DOI
    10.1109/ICTer.2013.6761170
  • Filename
    6761170