• DocumentCode
    2646431
  • Title

    A Speaker Count System for Telephone Conversations

  • Author

    Ofoegbu, Uchechukwu O. ; Iyer, Ananth N. ; Yantorno, Robert E. ; Smolenski, Brett Y

  • Author_Institution
    Lab. of Speech Process., Temple Univ., Philadelphia, PA
  • fYear
    2006
  • fDate
    12-15 Dec. 2006
  • Firstpage
    331
  • Lastpage
    334
  • Abstract
    In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%
  • Keywords
    covariance matrices; speaker recognition; Mahalanobis distance; average speaker count accuracy; covariance matrices; elimination process; likelihood ratio testing; linear predictive cepstral coefficients; speaker change-point detection; speaker tracking; speech segments matching; telephone conversations; three-way call detection; Broadcasting; Cepstral analysis; Covariance matrix; Indexing; Laboratories; Signal processing; Speech processing; Telephony; Testing; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Signal Processing and Communications, 2006. ISPACS '06. International Symposium on
  • Conference_Location
    Yonago
  • Print_ISBN
    0-7803-9732-0
  • Electronic_ISBN
    0-7803-9733-9
  • Type

    conf

  • DOI
    10.1109/ISPACS.2006.364899
  • Filename
    4212286