DocumentCode :
591475
Title :
Statistical analysis of Hindi BTEC speech database
Author :
Arora, Samarth ; Arora, Kavita ; Aggarwal, Shubhashis Sengupta
Author_Institution :
CDAC, Noida, India
fYear :
2012
fDate :
9-12 Dec. 2012
Firstpage :
157
Lastpage :
162
Abstract :
The BTEC (Basic Travel Expression Corpus) is developed by NICT, Japan and has a wide-coverage of basic Japanese travel expressions with English counterparts for the purpose of using it as the basic data for developing high quality speech translation system. The English counterpart of this corpus has been translated Hindi manually. It is used for development of English-Hindi speech translation system. In this paper, we present the statistical analysis of this translated Hindi BTEC corpus. Besides that, the translation methodology adopted in development of the corpus is also described. The statistical evaluations performed in the experiments, provide information of distribution of sentences, words, various phonemes and their growth behavior which provide direction for future enhancement of the corpus.
Keywords :
language translation; natural language processing; speech processing; statistical analysis; word processing; Basic Travel Expression Corpus; English-Hindi speech translation system quality; Hindi BTEC speech database corpus translation; Japanese travel expressions; NICT; growth behavior; phoneme distribution information; sentence distribution information; statistical analysis; word distribution information; Sampling methods; Shape; Sociology; Speech; Statistical analysis; Vocabulary; Corpus statistics; Hindi BTEC; Speech Corpus;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-2811-1
Electronic_ISBN :
978-1-4673-2812-8
Type :
conf
DOI :
10.1109/ICSDA.2012.6422480
Filename :
6422480
Link To Document :
بازگشت