DocumentCode :
902949
Title :
Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news
Author :
Ohtsuki, Katsutoshi ; Bessho, Katsuji ; Matsuo, Yoshihiro ; Matsunaga, Shoichi ; Hayashi, Yoshihiko
Volume :
23
Issue :
2
fYear :
2006
fDate :
3/1/2006 12:00:00 AM
Firstpage :
69
Lastpage :
78
Abstract :
This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results
Keywords :
acoustic signal processing; audio signal processing; database indexing; multimedia databases; speech recognition; acoustic segmentation; audio information; automatic multimedia content indexing system; automatic speech recognition; boundary detection accuracy; information integration module; multimedia broadcast news metadata; spectral-based features; speech detection performance; speech information; topic segmentation; video indexing features; visual information; word frequency vectors; Acoustic signal detection; Automatic speech recognition; Digital multimedia broadcasting; Frequency; Indexing; Multimedia communication; Multimedia systems; Smoothing methods; Speech recognition; Streaming media;
fLanguage :
English
Journal_Title :
Signal Processing Magazine, IEEE
Publisher :
ieee
ISSN :
1053-5888
Type :
jour
DOI :
10.1109/MSP.2006.1621450
Filename :
1621450
Link To Document :
بازگشت