مرکز منطقه ای اطلاع رساني علوم و فناوري - Gaze-contingent automatic speech recognition

DocumentCode :

1013830

Title :

Gaze-contingent automatic speech recognition

Author :

Cooke, N.J. ; Russell, Matthew

Author_Institution :

Dept. of Electron., Electr. & Comput. Eng., Sch. of Eng., Birmingham Univ., Birmingham

Volume :

Issue :

fYear :

2008

fDate :

12/1/2008 12:00:00 AM

Firstpage :

369

Lastpage :

380

Abstract :

There has been progress in improving speech recognition using a tightly-coupled modality such as lip movement; and using additional input interfaces to improve recognition of commands in multimodal human-computer interfaces such as speech and pen-based systems. However, there has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely-coupled modality. The study investigated this idea by integrating information from gaze into an automatic speech recognition (ASR) system. A probabilistic framework for multimodal recognition was formalised and applied to the specific case of integrating gaze and speech. Gaze-contingent ASR systems were developed from a baseline ASR system by redistributing language model probability mass according to the visual attention. These systems were tested on a corpus of matched eye movement and related spontaneous conversational British English speech segments (n=1355) for a visual-based, goal-driven task. The best performing systems had similar word error rates to the baseline ASR system and showed an increase in keyword spotting accuracy. The core values of this work may be useful for developing robust speech-centric multimodal decoding system functions.

Keywords :

human computer interaction; speech recognition; gaze-contingent automatic speech recognition; input interface; language model probability; lip movement; loosely-coupled modality; multimodal human-computer interface; multimodal recognition; pen-based system; speech-centric multimodal decoding system function; tightly-coupled modality;

fLanguage :

English

Journal_Title :

Signal Processing, IET

Publisher :

iet

ISSN :

1751-9675

Type :

jour

DOI :

10.1049/iet-spr:20070127

Filename :

4693973

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1013830