Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system

Author

Yun-Fan Chang ; Lin, Payton ; Shao-Hua Cheng ; Kai-Hsuan Chan ; Yi-Chong Zeng ; Chia-Wei Liao ; Wen-Tsung Chang ; Yu-Chiang Wang ; Yu Tsao

Author_Institution

Res. Center for Inf. Technol. Innovation, Taipei, Taiwan

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

1

Lastpage

4

Abstract

Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.

Keywords

audio signal processing; computational complexity; feature extraction; image classification; learning (artificial intelligence); neural nets; speaker recognition; vectors; video retrieval; DNN classifier; DNN system; SFN; anchorperson verification experiment; audio analysis; audio mismatch issues; audio streams; computational complexity; deep neural network system; equal error rate evaluation; hybrid I-vector; information retrieval; recording devices; robust anchorperson segment detection; speaker identity feature extraction; subspace feature normalization; video content indexing; Abstracts; Decision support systems; Feature extraction; Indexing; Information retrieval; Robustness; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location

Siem Reap

Type

conf

DOI

10.1109/APSIPA.2014.7041717

Filename

7041717