• DocumentCode
    2645273
  • Title

    A Speaker Identification System for Video Content Analysis

  • Author

    Bi, Jing ; Liu, Shu-Chang

  • Author_Institution
    Beijing Univ. of Posts & Telecommun., Beijing
  • fYear
    2008
  • fDate
    15-17 Aug. 2008
  • Firstpage
    200
  • Lastpage
    203
  • Abstract
    Recently, more literatures proposed to apply audio content analysis techniques in content-based video parsing. This paper presents our current works on a speaker identification system for video content analysis. The system is different from normal ones in the following aspects: firstly, soundtrack extracted from video stream includes not only silence and speech, but also music and environmental sound; secondly, the number of speakers in video content are uncertain; thirdly, the presence of noise in the video can significantly deteriorate system performance. According to these considerations, our speaker identification system involves such basic parts: audio classification and segmentation using rule and support vector machine (SVM) based classifier; speech clustering using spectral clustering technique and speaker identification based on Gaussian mixture model (GMM); speech enhancement based on spectral subtraction. Experiments are carried on a database extracted from news, conversation and movie videos. The obtained results confirm the validity of the proposed system architecture.
  • Keywords
    Gaussian processes; content-based retrieval; pattern classification; pattern clustering; speaker recognition; speech enhancement; support vector machines; video signal processing; video streaming; Gaussian mixture model; audio classification; audio segmentation; content-based video parsing; rule-based classifier; speaker identification system; spectral clustering technique; speech clustering; support vector machine; video content analysis; video stream; Acoustic noise; Databases; Loudspeakers; Music; Speech enhancement; Streaming media; Support vector machine classification; Support vector machines; System performance; Working environment noise; Audio Classification and Segmentation; Speaker Identification; Speech Enhancement; Video Parsing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP '08 International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-0-7695-3278-3
  • Type

    conf

  • DOI
    10.1109/IIH-MSP.2008.215
  • Filename
    4604039