DocumentCode :
2262248
Title :
Audio-visual speech perception without speech cues
Author :
Saldana, Helena M. ; Pisoni, David B. ; Fellowes, Jennifer M. ; Remez, Robert E.
Author_Institution :
Speech Res. Lab., Indiana Univ., Bloomington, IN, USA
Volume :
4
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
2187
Abstract :
A series of experiments was conducted in which listeners were presented with audio-visual sentences in a transcription task. The visual components of the stimuli consisted of a male talker´s face. The acoustic components consisted of: (1) natural speech; (2) envelope-shaped noise which preserved the duration and amplitude of the original speech waveform; and (3) various types of sine wave speech signals that followed the formant frequencies of a natural utterance. Sine wave speech is a skeletonized version of a natural utterance which contains frequency and amplitude variation of the formants, but lacks any fine-grained acoustic structure of speech. Intelligibility of the present set of sine wave sentences was relatively low in contrast to previous findings (Remez, Rubin, Pisoni, and Carrell, 1981). However, intelligibility was greatly increased when visual information from a talkers face was presented along with the auditory stimuli. Further experiments demonstrated that the intelligibility of single tones increased differentially depending on which formant analog was presented. It was predicted that the increase in intelligibility for the sine wave speech with an added video display would be greater than the gain observed with envelope-shaped noise. This prediction is based on the assumption that the information-bearing phonetic properties of spoken utterances are preserved in the audio-visual sine wave conditions
Keywords :
acoustic noise; audio-visual systems; natural languages; speech intelligibility; acoustic components; audio-visual sentences; audio-visual speech perception; auditory stimuli; envelope-shaped noise; formant amplitude variation; formant frequencies; formant frequency variation; information-bearing phonetic properties; intelligibility; listeners; male talker face; natural speech; natural utterance; sine wave speech signals; single tone intelligibility; skeletonized natural utterance; speech waveform; transcription task; video display; visual components; Acoustic noise; Acoustic waves; Educational institutions; Frequency; Laboratories; Natural languages; Noise level; Psychology; Speech enhancement; Speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607238
Filename :
607238
Link To Document :
بازگشت