DocumentCode :
69849
Title :
Group Delay Based Methods for Speaker Segregation and its Application in Multimedia Information Retrieval
Author :
Nathwani, Karan ; Pandit, Pattabhirama ; Hegde, Rajesh M.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Kanpur, Kanpur, India
Volume :
15
Issue :
6
fYear :
2013
fDate :
Oct. 2013
Firstpage :
1326
Lastpage :
1339
Abstract :
A novel method of single channel speaker segregation using the group delay cross correlation function is proposed in this paper. The group delay function, which is the negative derivative of the phase spectrum, yields robust spectral estimates. Hence the group delay spectral estimates are first computed over frequency sub-bands after passing the speech signal through a bank of filters. The filter bank spacing is based on a multi-pitch algorithm that computes the pitch estimates of the competing speakers. An affinity matrix is then computed from the group delay spectral estimates of each frequency sub-band. This affinity matrix represents the correlations of the different sub-bands in the mixed broadband speech signal. The grouping of correlated harmonics present in the mixed speech signal is then carried out by using a new iterative graph cut method. The signals are reconstructed from the respective harmonic groups which represent individual speakers in the mixed speech signal. Spectrographic masks are then applied on the reconstructed signals to refine their perceptual quality. The quality of separated speech is evaluated using several objective and subjective criteria. Experiments on multi-speaker automatic speech recognition are conducted using mixed speech data from the GRID corpus. A cell phone based multimedia information retrieval system (MIRS) for multi-source meeting environments are also developed.
Keywords :
channel bank filters; delays; graph theory; information retrieval systems; iterative methods; matrix algebra; mobile handsets; multimedia systems; signal reconstruction; speaker recognition; GRID corpus; MIRS; affinity matrix; cell phone based multimedia information retrieval system; correlated harmonic grouping; filter bank spacing; frequency subbands; group delay based methods; group delay cross-correlation function; group delay function; group delay spectral estimation; iterative graph cut method; mixed broadband speech signal; mixed speech data; multipitch algorithm; multisource meeting environments; multispeaker automatic speech recognition; negative derivative; objective criteria; perceptual quality; phase spectrum; pitch estimation; signal reconstruction; single channel speaker segregation; spectrographic masks; subjective criteria; Correlation; Delay; Frequency modulation; Harmonic analysis; Indexes; Speech; Time frequency analysis; Group delay cross correlation function; iterative graph cut method; multi-speaker speech recognition; multimedia information retrieval; speaker separation;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2013.2247391
Filename :
6470685
Link To Document :
بازگشت