مرکز منطقه ای اطلاع رساني علوم و فناوري - Designing a multimodal corpus of audio-visual speech using a high-speed camera

DocumentCode :

1843177

Title :

Designing a multimodal corpus of audio-visual speech using a high-speed camera

Author :

Karpov, Aleksey ; Ronzhin, Anatoly ; Kipyatkova, Irina

Author_Institution :

Speech & Multimodal Interfaces Lab., St. Petersburg Inst. for Inf. & Autom., St. Petersburg, Russia

Volume :

fYear :

2012

fDate :

21-25 Oct. 2012

Firstpage :

519

Lastpage :

522

Abstract :

In this paper, we present a research on designing and processing an audio-visual speech database for an automatic Russian speech recognition system using Oktava MK-012 microphone and JAI Pulnix RMC-6740GE high-speed camera (200 frames per second). Developed audio-visual speech recording system is described, it provides synchronization and fusion of audio and video data recorded by the independent sensors. The system automatically detects voice activity in audio signal and stores only speech fragments discarding non-informative signals. Also it takes into account and processes natural asynchrony of both speech modalities. Methods for feature extraction of acoustic (based on Mel-frequency cepstral coefficients) and visual speech (pixel-based features of mouth region) and multimodal data temporal segmentation (by forced alignment) are presented.

Keywords :

audio databases; image sensors; speech recognition; JAI Pulnix RMC-6740GE; Mel frequency cepstral coefficients; Oktava MK-012 microphone; Russian speech recognition system; audio signal; audio visual speech database; high speed camera; independent sensors; informative signals; mouth region; multimodal corpus design; multimodal data temporal segmentation; pixel based features; speech fragments; voice activity; audio-visual speech; automatic speech recognition; computer vision; high-speed camera; multimodal system;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing (ICSP), 2012 IEEE 11th International Conference on

Conference_Location :

Beijing

ISSN :

2164-5221

Print_ISBN :

978-1-4673-2196-9

Type :

conf

DOI :

10.1109/ICoSP.2012.6491539

Filename :

6491539

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1843177