مرکز منطقه ای اطلاع رساني علوم و فناوري - Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

DocumentCode :

865868

Title :

Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

Author :

Rivet, Bertrand ; Girin, Laurent ; Jutten, Christian

Author_Institution :

Inst. de la Commun. Parlee, Ecole Nationale d´´Electronique et de Radioelectricite, Grenoble

Volume :

Issue :

fYear :

2007

Firstpage :

Lastpage :

108

Abstract :

Looking at the speaker´s face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures

Keywords :

audio signal processing; blind source separation; feature extraction; frequency-domain analysis; speech processing; statistical analysis; transforms; audio blind source separation; audiovisual speech processing; blind source separation; convolutive mixtures; extraction systems; frequency separation; plugging audiovisual coherence; source permutation; speech enhancement; speech signals extraction; Acoustic noise; Blind source separation; Coherence; Frequency; Noise robustness; Signal processing; Source separation; Speech enhancement; Speech processing; Working environment noise; Audiovisual coherence; blind source separation; convolutive mixture; speech enhancement; statistical modeling;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2006.872619

Filename :

4032792

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=865868