DocumentCode :
2575899
Title :
A multi-stream audio-video large-vocabulary Mandarin Chinese speech database
Author :
Liang, Luhong ; Luo, Yu ; Huang, Feiyue ; Nefian, Ara V.
Author_Institution :
Syst. Technol. Labs., Intel Corp., Santa Clara, CA, USA
Volume :
3
fYear :
2004
fDate :
27-30 June 2004
Firstpage :
1787
Abstract :
We present the acquisition and content of a multi-stream audio-visual large-vocabulary database in Mandarin Chinese. The database consists of 17,000 utterances spoken by 225 people and captured by a set of seven cameras and 12 microphones. We also provide the label files that describe the endpoints of the utterances and the script files that represent the actual pronunciation of speech. The database can be used in audio-visual speech recognition (AVSR) for both large-vocabulary and small tasks, microphone array based speech recognition, audio-visual speaker identification and 3D face modeling.
Keywords :
audio databases; multimedia databases; natural languages; speech recognition; visual databases; vocabulary; 3D face modeling; Mandarin Chinese speech database; audio-video database; audio-visual speech recognition; label files; large-vocabulary speech database; multi-stream speech database; script files; speaker identification; speech pronunciation; Audio databases; Cameras; Face recognition; Microphone arrays; Spatial databases; Speech recognition; Streaming media; Video sequences; Videoconference; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
Print_ISBN :
0-7803-8603-5
Type :
conf
DOI :
10.1109/ICME.2004.1394602
Filename :
1394602
Link To Document :
بازگشت