DocumentCode
2646431
Title
A Speaker Count System for Telephone Conversations
Author
Ofoegbu, Uchechukwu O. ; Iyer, Ananth N. ; Yantorno, Robert E. ; Smolenski, Brett Y
Author_Institution
Lab. of Speech Process., Temple Univ., Philadelphia, PA
fYear
2006
fDate
12-15 Dec. 2006
Firstpage
331
Lastpage
334
Abstract
In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%
Keywords
covariance matrices; speaker recognition; Mahalanobis distance; average speaker count accuracy; covariance matrices; elimination process; likelihood ratio testing; linear predictive cepstral coefficients; speaker change-point detection; speaker tracking; speech segments matching; telephone conversations; three-way call detection; Broadcasting; Cepstral analysis; Covariance matrix; Indexing; Laboratories; Signal processing; Speech processing; Telephony; Testing; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Signal Processing and Communications, 2006. ISPACS '06. International Symposium on
Conference_Location
Yonago
Print_ISBN
0-7803-9732-0
Electronic_ISBN
0-7803-9733-9
Type
conf
DOI
10.1109/ISPACS.2006.364899
Filename
4212286
Link To Document