Title :
Twin-HMM-based audio-visual speech enhancement
Author :
Abdelaziz, Ahmed Hussen ; Zeiler, Steffen ; Kolossa, Dorothea
Author_Institution :
Digital Signal Process. Group, Ruhr-Univ. Bochum, Bochum, Germany
Abstract :
Most approaches for speech signal processing rely solely on acoustic input, which has the consequence that spectrum estimation becomes exceedingly difficult when the signal-to-noise ratio drops to values near 0 dB. However, alternative sources of information are becoming widely available with increasing use of multimedia data in everyday communication. In the following paper, we suggest to use video input as an auxiliary modality for speech processing by applying a new statistical model - the twin hidden Markov model. The resulting enhancement algorithm for audiovisual data greatly outperforms the standard audio-only log-MMSE estimator on all considered instrumental speech quality measures covering spectral and perceptual quality.
Keywords :
audio signal processing; audio-visual systems; estimation theory; hidden Markov models; least mean squares methods; multimedia communication; signal processing; speech enhancement; statistical analysis; audio-visual speech enhancement; audiovisual data algorithm; instrumental speech quality; multimedia data communication; perceptual quality; signal-to-noise ratio; spectral quality; spectrum estimation; speech signal processing; standard audio-only log-MMSE estimator; statistical model; twin hidden Markov model; twin-HMM; Hidden Markov models; Noise; Speech; Speech enhancement; Speech recognition; Vectors; Multimodal speech processing; audiovisual speech recognition; state-based speech enhancement;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638354