DocumentCode :
1403230
Title :
Visually Derived Wiener Filters for Speech Enhancement
Author :
Almajai, Ibrahim ; Milner, B.
Author_Institution :
Sch. of Comput. Sci., Univ. of East Anglia, Norwich, UK
Volume :
19
Issue :
6
fYear :
2011
Firstpage :
1642
Lastpage :
1651
Abstract :
The aim of this work is to examine whether visual speech information can be used to enhance audio speech that has been contaminated by noise. First, an analysis of audio and visual speech features is made, which identifies the pair with highest audio-visual correlation. The study also reveals that higher audio-visual correlation exists within individual phoneme sounds rather than globally across all speech. This correlation is exploited in the proposal of a visually derived Wiener filter that obtains clean speech and noise power spectrum statistics from visual speech features. Clean speech statistics are estimated from visual features using a maximum a posteriori framework that is integrated within the states of a network of hidden Markov models to provide phoneme localization. Noise statistics are obtained through a novel audio-visual voice activity detector which utilizes visual speech features to make robust speech/nonspeech classifications. The effectiveness of the visually derived Wiener filter is evaluated subjectively and objectively and is compared with three different audio-only enhancement methods over a range of signal-to-noise ratios.
Keywords :
Wiener filters; hidden Markov models; noise; speech enhancement; statistical analysis; audio speech; audio-visual correlation; clean speech statistics; hidden Markov models; maximum a posteriori framework; noise power spectrum statistics; noise statistics; phoneme localization; speech enhancement; visual speech features; visual speech information; visually derived Wiener filters; Correlation; Feature extraction; Noise; Shape; Speech; Speech enhancement; Visualization; Audio-visual; Wiener filter; maximum a posteriori; speech enhancement;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2010.2096212
Filename :
5667044
Link To Document :
بازگشت