• DocumentCode
    144890
  • Title

    A comparison of distance measures for clustering in speaker diarization

  • Author

    de Campos Niero, Marcelo ; de Lima Veiga Filho, Alvaro ; Adami, Andre Gustavo

  • Author_Institution
    Dept. of Electr. Eng., Pontifical Catholic Univ. of Rio de Janeiro, Rio de Janeiro, Brazil
  • fYear
    2014
  • fDate
    17-20 Aug. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Speaker diarization consists in answering the question “Who spoke when” for a given conversation in a telephone call, meeting, or broadcast news, without any prior information about neither the audio nor the speakers. Speaker diarization task emerged as a way to optimize audio information retrieval processing by detecting and tracking speech and speaker information. Computationally speaking, the diarization processing occurs through four main steps: feature extraction of signal, speech and non-speech detection, segmentation and clustering. In this work, the clustering step is analyzed by comparing distance measures commonly used in current speaker diarization systems. The results show that pairs of clusters with a large difference in the number of data samples are more sensitive to errors, the number of mixtures of an external model affects the discriminative power of distance measures, and the number of estimated parameters affects the speaker discrimination. All experiments are performed on an excerpt from TIMIT corpus and the diarization task database used in the 2002 NIST Speaker Recognition Evaluation.
  • Keywords
    feature extraction; parameter estimation; pattern clustering; speech recognition; NIST speaker recognition evaluation; TIMIT corpus; audio information retrieval processing; broadcast news; clusters; data samples; distance measures; nonspeech detection; parameter estimation; signal feature extraction; speaker diarization; speaker discrimination; speech detection; speech tracking; Adaptation models; Conferences; Databases; NIST; Power measurement; Speech; Speech processing; speaker clustering; speaker diarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications Symposium (ITS), 2014 International
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/ITS.2014.6947954
  • Filename
    6947954