DocumentCode :
865868
Title :
Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures
Author :
Rivet, Bertrand ; Girin, Laurent ; Jutten, Christian
Author_Institution :
Inst. de la Commun. Parlee, Ecole Nationale d´´Electronique et de Radioelectricite, Grenoble
Volume :
15
Issue :
1
fYear :
2007
Firstpage :
96
Lastpage :
108
Abstract :
Looking at the speaker´s face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures
Keywords :
audio signal processing; blind source separation; feature extraction; frequency-domain analysis; speech processing; statistical analysis; transforms; audio blind source separation; audiovisual speech processing; blind source separation; convolutive mixtures; extraction systems; frequency separation; plugging audiovisual coherence; source permutation; speech enhancement; speech signals extraction; Acoustic noise; Blind source separation; Coherence; Frequency; Noise robustness; Signal processing; Source separation; Speech enhancement; Speech processing; Working environment noise; Audiovisual coherence; blind source separation; convolutive mixture; speech enhancement; statistical modeling;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2006.872619
Filename :
4032792
Link To Document :
بازگشت