DocumentCode :
417648
Title :
News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003
Author :
Hsu, W. ; Kennedy, L. ; Huang, C.-W. ; Chang, S.-F. ; Lin, C.-Y. ; Iyengar, G.
Author_Institution :
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
Volume :
3
fYear :
2004
fDate :
17-21 May 2004
Abstract :
We present our new results in news video story segmentation and classification in the context of the TRECVID video retrieval benchmarking event 2003. We applied and extended the maximum entropy statistical model to fuse diverse features effectively from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to discover automatically relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features - asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measure up to 0.76) and, more importantly, observe an interesting opportunity for further improvement.
Keywords :
audio signal processing; feature extraction; image classification; image retrieval; image segmentation; maximum entropy methods; statistical analysis; video signal processing; TRECVID 2003; audio features; feature wrapper; maximum entropy statistical model; motion features; multi-level features; multi-modal features; news story classification; news story segmentation; prosody; statistical fusion model; text features; video classification; video retrieval; video segmentation; visual features; Animation; Automatic speech recognition; Broadcasting; Cellular neural networks; Entropy; Face detection; Fuses; Hidden Markov models; Music information retrieval; Performance gain;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326627
Filename :
1326627
Link To Document :
بازگشت