DocumentCode :
312289
Title :
Using the visual component in automatic speech recognition
Author :
Brooke, N. Michael
Author_Institution :
Sch. of Math. Sci., Bath Univ., UK
Volume :
3
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
1656
Abstract :
The movements of talkers´ faces are known to convey visual cues that can improve speech intelligibility, especially where there is noise or hearing impairment. This suggests that visible facial gestures could be exploited to enhance speech intelligibility in automatic systems. Handling the volume of data represented by images of talkers´ faces implies some form of data compression. Rather than using conventional feature extraction approaches, image coding and compression can be achieved using data-driven, statistically-oriented techniques such as artificial neural networks (ANNs) or principal component analysis (PCA). A major issue is the combination of the audio and visual data so that the best use can be made of the two modalities together. Perceptual experiments may offer guidance on suitable machine architectures, many of which currently use hidden Markov models (HMMs)
Keywords :
data compression; hidden Markov models; image coding; image recognition; neural nets; speech intelligibility; speech recognition; statistical analysis; visual perception; artificial neural networks; automatic speech recognition; data compression; data-driven techniques; hearing impairment; hidden Markov models; image coding; machine architectures; noise; perception; principal component analysis; speech intelligibility; statistically-oriented techniques; talkers´ face movements; visible facial gestures; visual component; visual cues; Acoustic noise; Automatic speech recognition; Data compression; Feature extraction; Hidden Markov models; Image coding; Mouth; Principal component analysis; Speech enhancement; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607943
Filename :
607943
Link To Document :
بازگشت