DocumentCode :
2436071
Title :
Unsupervised Indexing of Conversations with Short Speaker Utterances
Author :
Ofoegbu, Uchechukwu O. ; Iyer, Ananth N. ; Yantorno, Robert E. ; Wenndt, Stanley J.
Author_Institution :
Temple Univ., Philadelphia
fYear :
2007
fDate :
3-10 March 2007
Firstpage :
1
Lastpage :
11
Abstract :
Two speaker indexing system for conversations are presented in this paper. The first method involves indexing two-speaker conversations. In this method, two reference models are judiciously chosen from the conversation such that they represent the two different speakers. Models are then matched to the reference speakers using distance-based comparisons. The second technique is based on first determining the number of participants in the conversation using a speaker count method termed the "Residual Ratio Algorithm" (RRA), and then indexing based on this count. The RRA involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation and the relative amount of residual speech is observed to determine the count. The distance measures used in comparing models include the Bhattacharya distance, the T-Square statistics and the Mahalanobis distance. Speaker comparison decisions of all three distances are combined to improve the accuracy of the system. Linear Predictive Cepstral Coefficients of voiced phonemes are used in forming speaker models. The two-speaker indexing technique was able to yield an indexing accuracy of up to 95% when evaluated using SWITCHBOARD data. The counting-indexing technique resulted in a maximum indexing accuracy of about 91% when tested on artificial conversations generated from HTIMIT data.
Keywords :
indexing; speech processing; statistical analysis; SWITCHBOARD data; T-square statistics; counting-indexing technique; distance-based comparisons; elimination process; linear predictive cepstral coefficients; reference models; residual ratio algorithm; speaker count method; speaker indexing technique; speaker utterances; speech segments matching; unsupervised indexing; Biographies; Cepstral analysis; Indexing; Laboratories; Predictive models; Speech analysis; Speech processing; Statistics; Telephony; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference, 2007 IEEE
Conference_Location :
Big Sky, MT
ISSN :
1095-323X
Print_ISBN :
1-4244-0524-6
Electronic_ISBN :
1095-323X
Type :
conf
DOI :
10.1109/AERO.2007.352977
Filename :
4161417
Link To Document :
بازگشت