DocumentCode :
3713065
Title :
District names speech corpus for Pakistani Languages
Author :
Sahar Rauf;Asima Hameed;Tania Habib;Sarmad Hussain
Author_Institution :
Center for Language Engineering, Al-Khawarizmi Institute of Compute Science, University ofEngineering and Technology, Lahore, Pakistan
fYear :
2015
Firstpage :
207
Lastpage :
211
Abstract :
This paper presents a speech corpus that is developed for Urdu automatic speech recognition (ASR) system. The corpus comprises of single word utterances fixed vocabulary consisting of district names of Pakistan. The data is recorded over a telephone channel from all over Pakistan to cover six major accents; Punjabi, Urdu, Saraiki, Pashto, Sindhi, and Balochi. The data was collected in challenging acoustic environments; the major issues were silence, background noise and alternate pronunciations, which can affect the performance of the system. In order to address these issues, comprehensive data verification and cleaning guidelines are presented. The proposed process serves as a data preprocessing step for the development of ASR, which is successfully integrated in an Urdu dialog system to provide weather information of Pakistan.
Keywords :
Meteorology
Publisher :
ieee
Conference_Titel :
Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference
Type :
conf
DOI :
10.1109/ICSDA.2015.7357893
Filename :
7357893
Link To Document :
بازگشت