مرکز منطقه ای اطلاع رساني علوم و فناوري - Viseme definitions comparison for visual-only speech recognition

DocumentCode :

695667

Title :

Viseme definitions comparison for visual-only speech recognition

Author :

Cappelletta, Luca ; Harte, Naomi

Author_Institution :

Dept. of Electron. & Electr. Eng, Trinity Coll. Dublin, Dublin, Ireland

fYear :

2011

fDate :

Aug. 29 2011-Sept. 2 2011

Firstpage :

2109

Lastpage :

2113

Abstract :

Audio-visual speech recognition (AVSR) involves recognising of what a speaker is uttering using both audio and visual cues. While phonemes, the units of speech in the audio domain, are well documented, this is not equally true for the speech units in the visual domain: visemes. In the literature, only a generic viseme definition is recognised. There is no agreement on what visemes practically imply, and if they are just related to mouth position or mouth movement. In this paper a visual-only speech recognition system is presented, trained using either PCA or optical flow visual features. Recognition rate changes depending on which practical viseme definition has been used. Four viseme definitions were tested and results are analyzed in order to establish which is, within the 4 candidates, the best performing viseme definition.

Keywords :

image sequences; principal component analysis; speech recognition; AVSR; PCA; audio cues; audio domain; audio-visual speech recognition; generic viseme definition; optical flow visual features; recognition rate; speech units; visual cues; visual domain; visual-only speech recognition system; Databases; Feature extraction; Hidden Markov models; Principal component analysis; Speech; Speech recognition; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing Conference, 2011 19th European

Conference_Location :

Barcelona

ISSN :

2076-1465

Type :

conf

Filename :

7074217

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=695667