DocumentCode
2575899
Title
A multi-stream audio-video large-vocabulary Mandarin Chinese speech database
Author
Liang, Luhong ; Luo, Yu ; Huang, Feiyue ; Nefian, Ara V.
Author_Institution
Syst. Technol. Labs., Intel Corp., Santa Clara, CA, USA
Volume
3
fYear
2004
fDate
27-30 June 2004
Firstpage
1787
Abstract
We present the acquisition and content of a multi-stream audio-visual large-vocabulary database in Mandarin Chinese. The database consists of 17,000 utterances spoken by 225 people and captured by a set of seven cameras and 12 microphones. We also provide the label files that describe the endpoints of the utterances and the script files that represent the actual pronunciation of speech. The database can be used in audio-visual speech recognition (AVSR) for both large-vocabulary and small tasks, microphone array based speech recognition, audio-visual speaker identification and 3D face modeling.
Keywords
audio databases; multimedia databases; natural languages; speech recognition; visual databases; vocabulary; 3D face modeling; Mandarin Chinese speech database; audio-video database; audio-visual speech recognition; label files; large-vocabulary speech database; multi-stream speech database; script files; speaker identification; speech pronunciation; Audio databases; Cameras; Face recognition; Microphone arrays; Spatial databases; Speech recognition; Streaming media; Video sequences; Videoconference; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
Print_ISBN
0-7803-8603-5
Type
conf
DOI
10.1109/ICME.2004.1394602
Filename
1394602
Link To Document