مرکز منطقه ای اطلاع رساني علوم و فناوري - Computationally Efficient and Robust BIC-Based Speaker Segmentation

DocumentCode :

754325

Title :

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Author :

Kotti, Margarita ; Benetos, Emmanouil ; Kotropoulos, Constantine

Author_Institution :

Dept. of Inf., Aristotle Univ. of Thessaloniki, Thessaloniki

Volume :

Issue :

fYear :

2008

fDate :

7/1/2008 12:00:00 AM

Firstpage :

920

Lastpage :

933

Abstract :

An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches.

Keywords :

Bayes methods; Gaussian distribution; covariance matrices; maximum likelihood estimation; speech processing; Bayesian information criterion; automatic speaker segmentation; covariance matrices; figures of merit; inverse Gaussian distribution; maximum-likelihood estimation; speaker utterance modeling; Audio recording; Bayesian methods; Covariance matrix; MPEG 7 Standard; Maximum likelihood estimation; NIST; Performance evaluation; Robustness; Speech; System testing; Automatic speaker segmentation; Bayesian information criterion (BIC); inverse Gaussian distribution; simultaneous diagonalization; speaker utterance duration distribution; speech analysis;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2008.925152

Filename :

4544824

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=754325