Title :
Speaker identification and video analysis for hierarchical video shot classification
Author :
Nam, JeHo ; Çetin, A. Enis ; Tewfik, Ahmed H.
Author_Institution :
Dept. of Electr. Eng., Minnesota Univ., Minneapolis, MN, USA
Abstract :
We present a new video shot classification and clustering technique to support content-based indexing, browsing and retrieval in video databases. The proposed method is based on the analysis of both the audio and visual data tracks. The visual stream is analyzed using a 3-D wavelet transform and segmented into shot units which are matched and clustered by visual content. Simultaneously, speaker changes are detected by tracking voiced phonemes in the audio signal. The clues obtained from the video and speech data are combined to classify and group the isolated video shots. This integrated approach also allows effective indexing of the audio-visual objects in multimedia databases
Keywords :
audio signals; image classification; image matching; image segmentation; image sequences; indexing; information retrieval; multimedia computing; query processing; speaker recognition; video signal processing; visual databases; wavelet transforms; 3D wavelet transform; audio data tracks; audio signal; audio-visual objects indexing; browsing; clustering technique; content-based indexing; content-based retrieval; hierarchical video shot classification; multimedia databases; speaker changes detection; speaker identification; speech data; video analysis; video data; video databases; video shots matching; visual data tracks; visual stream analysis; voiced phonemes tracking; Content based retrieval; Electronic mail; Indexing; Information retrieval; Multimedia databases; Noise robustness; Speech; Streaming media; Video sequences; Visual databases; Wavelet analysis; Wavelet transforms;
Conference_Titel :
Image Processing, 1997. Proceedings., International Conference on
Conference_Location :
Santa Barbara, CA
Print_ISBN :
0-8186-8183-7
DOI :
10.1109/ICIP.1997.638830