DocumentCode :
2646431
Title :
A Speaker Count System for Telephone Conversations
Author :
Ofoegbu, Uchechukwu O. ; Iyer, Ananth N. ; Yantorno, Robert E. ; Smolenski, Brett Y
Author_Institution :
Lab. of Speech Process., Temple Univ., Philadelphia, PA
fYear :
2006
fDate :
12-15 Dec. 2006
Firstpage :
331
Lastpage :
334
Abstract :
In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%
Keywords :
covariance matrices; speaker recognition; Mahalanobis distance; average speaker count accuracy; covariance matrices; elimination process; likelihood ratio testing; linear predictive cepstral coefficients; speaker change-point detection; speaker tracking; speech segments matching; telephone conversations; three-way call detection; Broadcasting; Cepstral analysis; Covariance matrix; Indexing; Laboratories; Signal processing; Speech processing; Telephony; Testing; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Signal Processing and Communications, 2006. ISPACS '06. International Symposium on
Conference_Location :
Yonago
Print_ISBN :
0-7803-9732-0
Electronic_ISBN :
0-7803-9733-9
Type :
conf
DOI :
10.1109/ISPACS.2006.364899
Filename :
4212286
Link To Document :
بازگشت