DocumentCode :
2206848
Title :
Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization
Author :
Ben-Harush, Oshry ; Guterman, Hugo ; Lapidot, Itshak
Author_Institution :
Dep. of Electr. & Comput. Eng., Ben-Gurion Univ. of the Negev, Beer-Sheva, Israel
fYear :
2009
fDate :
1-4 Sept. 2009
Firstpage :
1
Lastpage :
6
Abstract :
Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to the question "Who spoke when ?". One inherent deficiency of most current systems is their inability to handle co-channel or overlapped speech. During the past few years, several studies have attempted dealing with the problem of overlapped or co-channel speech detection and separation, however, most of the algorithms suggested perform under unique conditions, require high computational complexity and require both time and frequency domain analysis of the audio data. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 60.0% of the frames labeled as overlapped speech by the baseline (ground-truth) segmentation , while keeping a 5% false-alarm rate.
Keywords :
Gaussian processes; computational complexity; entropy; speech processing; temporal databases; time-varying networks; Gaussian mixture modeling; LDC CALLHOME American English corpus; audio data; computational complexity; frame level entropy analysis; frequency domain analysis; overlapped speech detection; speaker diarization system; speech detection; speech separation; time domain analysis; Autocorrelation; Automatic speech recognition; Computational complexity; Detection algorithms; Educational institutions; Entropy; Hidden Markov models; Linear predictive coding; Speech analysis; Speech processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning for Signal Processing, 2009. MLSP 2009. IEEE International Workshop on
Conference_Location :
Grenoble
Print_ISBN :
978-1-4244-4947-7
Electronic_ISBN :
978-1-4244-4948-4
Type :
conf
DOI :
10.1109/MLSP.2009.5306205
Filename :
5306205
Link To Document :
بازگشت