DocumentCode
312289
Title
Using the visual component in automatic speech recognition
Author
Brooke, N. Michael
Author_Institution
Sch. of Math. Sci., Bath Univ., UK
Volume
3
fYear
1996
fDate
3-6 Oct 1996
Firstpage
1656
Abstract
The movements of talkers´ faces are known to convey visual cues that can improve speech intelligibility, especially where there is noise or hearing impairment. This suggests that visible facial gestures could be exploited to enhance speech intelligibility in automatic systems. Handling the volume of data represented by images of talkers´ faces implies some form of data compression. Rather than using conventional feature extraction approaches, image coding and compression can be achieved using data-driven, statistically-oriented techniques such as artificial neural networks (ANNs) or principal component analysis (PCA). A major issue is the combination of the audio and visual data so that the best use can be made of the two modalities together. Perceptual experiments may offer guidance on suitable machine architectures, many of which currently use hidden Markov models (HMMs)
Keywords
data compression; hidden Markov models; image coding; image recognition; neural nets; speech intelligibility; speech recognition; statistical analysis; visual perception; artificial neural networks; automatic speech recognition; data compression; data-driven techniques; hearing impairment; hidden Markov models; image coding; machine architectures; noise; perception; principal component analysis; speech intelligibility; statistically-oriented techniques; talkers´ face movements; visible facial gestures; visual component; visual cues; Acoustic noise; Automatic speech recognition; Data compression; Feature extraction; Hidden Markov models; Image coding; Mouth; Principal component analysis; Speech enhancement; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607943
Filename
607943
Link To Document