• DocumentCode
    395218
  • Title

    UBM-based real-time speaker segmentation for broadcasting news

  • Author

    Wu, TingYao ; Lu, Lie ; Chen, Ke ; Zhang, Hong-Jiang

  • Author_Institution
    Peking Univ., Beijing, China
  • Volume
    2
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    This paper addresses the problem of real-time speaker change detection in broadcast news, in which no prior knowledge on speakers is assumed. Our speaker segmentation is a "coarse to refine" process, which consists of two stages: pre-segmentation and refinement. In the pre-segmentation stage, a new approach based on Gaussian mixture model-universal background model (GMM-UBM) is proposed to categorize feature vectors into three sets, i.e. reliable speaker-related set, doubtful speaker-related set and unreliable speaker-related set, in order to enhance the effect of the reliable speaker-related feature vectors. Then potential speaker change boundaries are detected based on a novel distance measure. In the refinement stage, incremental speaker adaptation (ISA), which is suitable for real-time requirement, is proposed to obtain considerably precise speaker models so that the potential speaker change boundaries can be confirmed and refined. Experimental results demonstrate that our approach yields satisfactory performance.
  • Keywords
    Gaussian distribution; broadcasting; feature extraction; speaker recognition; GMM-UBM; Gaussian mixture model-universal background model; broadcast news; change boundary detection; coarse to refine process; distance measure; doubtful speaker-related set; feature vectors; incremental speaker adaptation; performance; pre-segmentation; real-time speaker segmentation; refinement; reliable speaker-related set; speaker change detection; unreliable speaker-related set; Asia; Broadcasting; Costs; Indexing; Instruction sets; Iterative algorithms; Iterative methods; Real time systems; Speech; Streaming media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1202327
  • Filename
    1202327