Title :
A multi-stream audio-video large-vocabulary Mandarin Chinese speech database
Author :
Liang, Luhong ; Luo, Yu ; Huang, Feiyue ; Nefian, Ara V.
Author_Institution :
Syst. Technol. Labs., Intel Corp., Santa Clara, CA, USA
Abstract :
We present the acquisition and content of a multi-stream audio-visual large-vocabulary database in Mandarin Chinese. The database consists of 17,000 utterances spoken by 225 people and captured by a set of seven cameras and 12 microphones. We also provide the label files that describe the endpoints of the utterances and the script files that represent the actual pronunciation of speech. The database can be used in audio-visual speech recognition (AVSR) for both large-vocabulary and small tasks, microphone array based speech recognition, audio-visual speaker identification and 3D face modeling.
Keywords :
audio databases; multimedia databases; natural languages; speech recognition; visual databases; vocabulary; 3D face modeling; Mandarin Chinese speech database; audio-video database; audio-visual speech recognition; label files; large-vocabulary speech database; multi-stream speech database; script files; speaker identification; speech pronunciation; Audio databases; Cameras; Face recognition; Microphone arrays; Spatial databases; Speech recognition; Streaming media; Video sequences; Videoconference; Vocabulary;
Conference_Titel :
Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
Print_ISBN :
0-7803-8603-5
DOI :
10.1109/ICME.2004.1394602