• DocumentCode
    2575899
  • Title

    A multi-stream audio-video large-vocabulary Mandarin Chinese speech database

  • Author

    Liang, Luhong ; Luo, Yu ; Huang, Feiyue ; Nefian, Ara V.

  • Author_Institution
    Syst. Technol. Labs., Intel Corp., Santa Clara, CA, USA
  • Volume
    3
  • fYear
    2004
  • fDate
    27-30 June 2004
  • Firstpage
    1787
  • Abstract
    We present the acquisition and content of a multi-stream audio-visual large-vocabulary database in Mandarin Chinese. The database consists of 17,000 utterances spoken by 225 people and captured by a set of seven cameras and 12 microphones. We also provide the label files that describe the endpoints of the utterances and the script files that represent the actual pronunciation of speech. The database can be used in audio-visual speech recognition (AVSR) for both large-vocabulary and small tasks, microphone array based speech recognition, audio-visual speaker identification and 3D face modeling.
  • Keywords
    audio databases; multimedia databases; natural languages; speech recognition; visual databases; vocabulary; 3D face modeling; Mandarin Chinese speech database; audio-video database; audio-visual speech recognition; label files; large-vocabulary speech database; multi-stream speech database; script files; speaker identification; speech pronunciation; Audio databases; Cameras; Face recognition; Microphone arrays; Spatial databases; Speech recognition; Streaming media; Video sequences; Videoconference; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
  • Print_ISBN
    0-7803-8603-5
  • Type

    conf

  • DOI
    10.1109/ICME.2004.1394602
  • Filename
    1394602