DocumentCode
672866
Title
Development of a standard text and speech corpus for the Punjabi language
Author
Dhanjal, Surinder ; Bhatia, Satvinder Singh
Author_Institution
Dept. of Comput. Sci., Thompson Rivers Univ., Kamloops, BC, Canada
fYear
2013
fDate
25-27 Nov. 2013
Firstpage
1
Lastpage
6
Abstract
In this paper, a new text and speech corpus in the Punjabi language has been developed. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top spoken languages of the world. Over the years, this ranking has varied between 10 and 18. Any research work on the Punjabi language, therefore, assumes an international significance. The Punjabi language is the native language of the Punjab state in two countries: East Punjab in India, and West Punjab in Pakistan. There are many dialects of the Punjabi language and two different scripts in both countries. It will be an enormous task to design a new text or speech corpus that can completely describe all dialects in both scripts. This work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai dialect. This paper describes at least 20 unique features of the newly designed corpus.
Keywords
natural languages; speech processing; text analysis; East Punjab; India; Indo-Aryan language; Malwai dialect; Punjabi language; West Punjab; speech corpus; standard text corpus development; Agriculture; Animals; Cities and towns; Databases; Speech; Speech processing; Vegetation; Corpora development; Gurmukhi Script; IPA; Malwa; Malwai Dialect; Punjabi language; Speech corpus; Speech processing; Text corpus;
fLanguage
English
Publisher
ieee
Conference_Titel
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location
Gurgaon
Type
conf
DOI
10.1109/ICSDA.2013.6709891
Filename
6709891
Link To Document