DocumentCode :
672866
Title :
Development of a standard text and speech corpus for the Punjabi language
Author :
Dhanjal, Surinder ; Bhatia, Satvinder Singh
Author_Institution :
Dept. of Comput. Sci., Thompson Rivers Univ., Kamloops, BC, Canada
fYear :
2013
fDate :
25-27 Nov. 2013
Firstpage :
1
Lastpage :
6
Abstract :
In this paper, a new text and speech corpus in the Punjabi language has been developed. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top spoken languages of the world. Over the years, this ranking has varied between 10 and 18. Any research work on the Punjabi language, therefore, assumes an international significance. The Punjabi language is the native language of the Punjab state in two countries: East Punjab in India, and West Punjab in Pakistan. There are many dialects of the Punjabi language and two different scripts in both countries. It will be an enormous task to design a new text or speech corpus that can completely describe all dialects in both scripts. This work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai dialect. This paper describes at least 20 unique features of the newly designed corpus.
Keywords :
natural languages; speech processing; text analysis; East Punjab; India; Indo-Aryan language; Malwai dialect; Punjabi language; West Punjab; speech corpus; standard text corpus development; Agriculture; Animals; Cities and towns; Databases; Speech; Speech processing; Vegetation; Corpora development; Gurmukhi Script; IPA; Malwa; Malwai Dialect; Punjabi language; Speech corpus; Speech processing; Text corpus;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
Type :
conf
DOI :
10.1109/ICSDA.2013.6709891
Filename :
6709891
Link To Document :
بازگشت