مرکز منطقه ای اطلاع رساني علوم و فناوري - The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments

DocumentCode :

2874957

Title :

The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments

Author :

Lincoln, Mike ; McCowan, Iain ; Vepa, Jithendra ; Maganti, Hari Krishna

Author_Institution :

Centre for Speech Technol. Res., Edinburgh Univ.

fYear :

2005

fDate :

27-27 Nov. 2005

Firstpage :

357

Lastpage :

362

Abstract :

The recognition of speech in meetings poses a number of challenges to current automatic speech recognition (ASR) techniques. Meetings typically take place in rooms with non-ideal acoustic conditions and significant background noise, and may contain large sections of overlapping speech. In such circumstances, headset microphones have to date provided the best recognition performance, however participants are often reluctant to wear them. Microphone arrays provide an alternative to close-talking microphones by providing speech enhancement through directional discrimination. Unfortunately, however, development of array front-end systems for state-of-the-art large vocabulary continuous speech recognition suffers from a lack of necessary resources, as most available speech corpora consist only of single-channel recordings. This paper describes the collection of an audio-visual corpus of read speech from a number of instrumented meeting rooms. The corpus, based on the WSJCAM0 database, is suitable for use in continuous speech recognition experiments and is captured using a variety of microphones, including arrays, as well as close-up and wider angle cameras. The paper also describes some initial ASR experiments on the corpus comparing the use of close-talking microphones with both a fixed and a blind array beamforming technique

Keywords :

audio signal processing; microphone arrays; office automation; speech enhancement; speech recognition; vocabulary; array front-end systems; automatic speech recognition; background noise; blind array beamforming; close-talking microphones; headset microphones; microphone arrays; multichannel wall street journal audio visual corpus; overlapping speech; speech enhancement; vocabulary; Audio recording; Automatic speech recognition; Background noise; Cameras; Databases; Instruments; Microphone arrays; Speech enhancement; Speech recognition; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on

Conference_Location :

San Juan

Print_ISBN :

0-7803-9478-X

Electronic_ISBN :

0-7803-9479-8

Type :

conf

DOI :

10.1109/ASRU.2005.1566470

Filename :

1566470

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2874957