Asynchronous integration of visual information in an automatic speech recognition system

Author

Alissali, Mamoun ; Deléglise, Paul ; Rogozan, Alexandrina

Author_Institution

Lab. d´´Inf., Maine Univ., Le Mans, France

Volume

1

fYear

1996

fDate

3-6 Oct 1996

Firstpage

34

Abstract

Deals with that integration of visual data in automatic speech recognition systems. We first describe the framework of our research; the development of advanced multi-user multi-modal interfaces. Then we present audio-visual speech recognition problems in general, and the ones we are interested in, in particular. After a very brief discussion of existing systems, we present the architecture of our audio-only reference and baseline systems and describe our audio-visual systems. The major part of the paper describes the systems we developed according to two different approaches to the problem of integration of visual data in speech recognition systems. We first describe a system we developed according to the first approach (called the direct integration model) and show its limitations. Our approach, which we call asynchronous integration, is then presented. After the general guidelines, we go into some details about the distributed architecture and the variant of the N-best algorithm we developed for the implementation of this approach. The performances of these different systems are compared, and we conclude by a brief discussion of the performance improvements we have obtained and future work

Keywords

audio-visual systems; parallel architectures; software performance evaluation; speech recognition; user interfaces; N-best algorithm; advanced multi-user multi-modal interfaces; asynchronous integration; audio-only systems architecture; audio-visual speech recognition problems; automatic speech recognition system; direct integration model; distributed architecture; performance improvements; visual information integration; Acoustic noise; Acoustic testing; Automatic speech recognition; Automatic testing; Guidelines; Noise level; Noise robustness; Probability distribution; Speech enhancement; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607018

Filename

607018