Title :
Multi-speaker, narrowband, continuous Marathi speech database
Author :
Godambe, Tejas ; Bondale, Nandini ; Samudravijaya, K. ; Rao, Prahlada
Author_Institution :
Sch. of Technol. & Comput. Sci., Tata Inst. of Fundamental Res., Mumbai, India
Abstract :
We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.
Keywords :
natural language processing; speech recognition; Maharashtra; Marathi language; continuous Marathi speech database; data acquisition system; gender distribution; multispeaker speech database; narrowband Marathi speech database; nonspeech sounds; personal mobile handsets; phonemic richness; sentence data acquisition process; specially designed sentence sets; Data acquisition; Databases; Educational institutions; Mobile communication; Narrowband; Speech; Wideband; Marathi; speech data; speech recognition; transcription;
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
DOI :
10.1109/ICSDA.2013.6709844