DocumentCode :
2951665
Title :
Combining text and audio-visual features in video indexing
Author :
Chang, Shih-Fu ; Manmatha, R. ; Chua, Tat-Seng
Author_Institution :
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
Volume :
5
fYear :
2005
fDate :
18-23 March 2005
Abstract :
We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance are described, primarily in the broadcast news video domain.
Keywords :
database indexing; information retrieval; learning (artificial intelligence); speech recognition; text analysis; video databases; ASR; audio-visual features; automatic speech recognition; broadcast news video; concept detection; imperfect text data; machine learning; multi-modal features; retrieval; story segmentation; text features; topic clustering; video indexing; Automatic speech recognition; Computer science; Data mining; Fuses; Indexing; Information retrieval; Layout; Machine learning; Multimedia communication; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8874-7
Type :
conf
DOI :
10.1109/ICASSP.2005.1416476
Filename :
1416476
Link To Document :
بازگشت