مرکز منطقه ای اطلاع رساني علوم و فناوري - Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization

DocumentCode :

2206848

Title :

Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization

Author :

Ben-Harush, Oshry ; Guterman, Hugo ; Lapidot, Itshak

Author_Institution :

Dep. of Electr. & Comput. Eng., Ben-Gurion Univ. of the Negev, Beer-Sheva, Israel

fYear :

2009

fDate :

1-4 Sept. 2009

Firstpage :

Lastpage :

Abstract :

Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to the question "Who spoke when ?". One inherent deficiency of most current systems is their inability to handle co-channel or overlapped speech. During the past few years, several studies have attempted dealing with the problem of overlapped or co-channel speech detection and separation, however, most of the algorithms suggested perform under unique conditions, require high computational complexity and require both time and frequency domain analysis of the audio data. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 60.0% of the frames labeled as overlapped speech by the baseline (ground-truth) segmentation , while keeping a 5% false-alarm rate.

Keywords :

Gaussian processes; computational complexity; entropy; speech processing; temporal databases; time-varying networks; Gaussian mixture modeling; LDC CALLHOME American English corpus; audio data; computational complexity; frame level entropy analysis; frequency domain analysis; overlapped speech detection; speaker diarization system; speech detection; speech separation; time domain analysis; Autocorrelation; Automatic speech recognition; Computational complexity; Detection algorithms; Educational institutions; Entropy; Hidden Markov models; Linear predictive coding; Speech analysis; Speech processing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning for Signal Processing, 2009. MLSP 2009. IEEE International Workshop on

Conference_Location :

Grenoble

Print_ISBN :

978-1-4244-4947-7

Electronic_ISBN :

978-1-4244-4948-4

Type :

conf

DOI :

10.1109/MLSP.2009.5306205

Filename :

5306205

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2206848