DocumentCode :
2245526
Title :
Asynchronous integration of visual information in an automatic speech recognition system
Author :
Alissali, Mamoun ; Deléglise, Paul ; Rogozan, Alexandrina
Author_Institution :
Lab. d´´Inf., Maine Univ., Le Mans, France
Volume :
1
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
34
Abstract :
Deals with that integration of visual data in automatic speech recognition systems. We first describe the framework of our research; the development of advanced multi-user multi-modal interfaces. Then we present audio-visual speech recognition problems in general, and the ones we are interested in, in particular. After a very brief discussion of existing systems, we present the architecture of our audio-only reference and baseline systems and describe our audio-visual systems. The major part of the paper describes the systems we developed according to two different approaches to the problem of integration of visual data in speech recognition systems. We first describe a system we developed according to the first approach (called the direct integration model) and show its limitations. Our approach, which we call asynchronous integration, is then presented. After the general guidelines, we go into some details about the distributed architecture and the variant of the N-best algorithm we developed for the implementation of this approach. The performances of these different systems are compared, and we conclude by a brief discussion of the performance improvements we have obtained and future work
Keywords :
audio-visual systems; parallel architectures; software performance evaluation; speech recognition; user interfaces; N-best algorithm; advanced multi-user multi-modal interfaces; asynchronous integration; audio-only systems architecture; audio-visual speech recognition problems; automatic speech recognition system; direct integration model; distributed architecture; performance improvements; visual information integration; Acoustic noise; Acoustic testing; Automatic speech recognition; Automatic testing; Guidelines; Noise level; Noise robustness; Probability distribution; Speech enhancement; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607018
Filename :
607018
Link To Document :
بازگشت