DocumentCode :
591472
Title :
Collection and annotation of Malay conversational speech corpus
Author :
Tze Yuang Chong ; Xiong Xiao ; Tien-Ping Tan ; Eng Siong Chng ; Haizhou Li
Author_Institution :
Temasek Lab., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2012
fDate :
9-12 Dec. 2012
Firstpage :
30
Lastpage :
35
Abstract :
We report the development of a Malay conversational speech corpus as part of our research in spontaneous conversational speech LVCSR. This corpus development effort is the collaboration between NTU and USM. The goal is to collect, transcribe, and annotate 50 hours of conversational Malay speech. The conversation is recorded from both close-talk and telephone channels, and both speakers´ utterances are kept in separate tracks. Besides the word transcription, we also annotate linguistics phenomena such as fillers and disfluencies. To date, 20 hours have been recorded, transcribed and analyzed. The details of our analysis will be presented in this report.
Keywords :
linguistics; natural language processing; speech recognition; LVCSR; Malay conversational speech corpus; NTU; USM; close-talk; linguistics phenomena; speaker utterances; telephone channels; word transcription; Computers; Educational institutions; Pragmatics; Speech; Speech processing; Speech recognition; Vocabulary; LVCSR; Malay corpus; conversational speech; spontaneous speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-2811-1
Electronic_ISBN :
978-1-4673-2812-8
Type :
conf
DOI :
10.1109/ICSDA.2012.6422473
Filename :
6422473
Link To Document :
بازگشت