Title :
Development of a Chinese telephony conversational corpus for speech processing [speech recognition applications]
Author :
Yi, Liu ; Fung, Pascale ; Huang, Shudong ; Cieri, Christopher ; Lufeng, Zhai ; Benfeng, Chen
Author_Institution :
Human Language Technol. Center, Hong Kong Univ. of Sci. & Technol., China
Abstract :
This paper describes the development of the EARS (effective, affordable, reusable speech-to-text) Chinese corpus, a telephony conversational speech database for speech processing. The EARS database is the first of its kind collected for Mandarin Chinese telephony spontaneous speech. The purpose of developing this EARS Chinese corpus is to collect Mandarin conversations between either strangers or friends, which cover a wide range of topics, over landline and cellular channels. All the speech data are annotated with standard Chinese character transcription as well as specific mark-ups for spontaneous speech. This corpus will be used for conversational and spontaneous Mandarin speech recognition tasks, under the DARPA EARS framework. This paper introduces the design, development, structure, and initial phonetic analysis of the first 50-hour collection of this corpus. Additional 300 to 500 hours of data will be collected and transcribed between 2004 and 2005.
Keywords :
audio databases; natural languages; speech processing; speech recognition; Chinese character transcription; Chinese telephony conversational corpus; Chinese telephony conversational speech database; DARPA EARS framework; EARS Chinese corpus; Mandarin Chinese telephony spontaneous speech; Mandarin conversations; annotated speech data; conversational Mandarin speech recognition; phonetic analysis; speech processing; spontaneous Mandarin speech recognition; spontaneous speech mark-ups; Automatic speech recognition; Databases; Ear; Loudspeakers; Microphones; Natural languages; Speech processing; Speech recognition; Speech synthesis; Telephony;
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
DOI :
10.1109/CHINSL.2004.1409620