Hindi speech corpora: A review

Author

Nivedita ; Ahmed, P. ; Dev, Amita ; Agrawal, S.S.

Author_Institution

Sch. of Eng. & Technol., Sharda Univ., Noida, India

fYear

2013

fDate

25-27 Nov. 2013

Firstpage

1

Lastpage

6

Abstract

A benchmark dataset provides insight into the phenomena that generate the data. Hence, it is an essential requirement to conduct research that requires concept discovery from data. In this paper, we examine the current status of 26 (twenty-six) datasets for Hindi speech (or Hindi speech corpora). This paper also aims at studying their impacts on Hindi speech based computer mediated application development. During this study, we discovered that researchers have paid little attention to issues relating to data collection from a realistic environment through mobile phone. Out of the twenty-six Hindi speech corpora reviewed only one is created for speaker recognition, in which conversation speech samples are recorded through mobile phone for noisy as well as clear condition.

Keywords

natural language processing; speaker recognition; Hindi speech corpora; benchmark dataset; computer mediated application development; data collection; data discovery; mobile phone; speaker recognition; speech samples; Databases; Educational institutions; Microphones; Mobile communication; Mobile handsets; Speech; Speech recognition; recording enviornment; speech corpora;

fLanguage

English

Publisher

ieee

Conference_Titel

Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference

Conference_Location

Gurgaon

Type

conf

DOI

10.1109/ICSDA.2013.6709872

Filename

6709872