DocumentCode :
3461884
Title :
Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval
Author :
Hui, Pui Yu ; Lo, Wai Kif ; Meng, Helen M.
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
Volume :
5
fYear :
2003
fDate :
6-10 April 2003
Abstract :
The paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of the AoE-IT Multimedia Repository. We have also developed the multimedia markup language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian mixture models; (iii) a fusion strategy that combines video- and audio-based extraction. The paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.
Keywords :
Gaussian processes; content-based retrieval; fuzzy systems; image retrieval; information retrieval; multimedia databases; speech recognition; video signal processing; Cantonese syllable recognizer; Gaussian mixture models; anchor speech; audio information; audio recordings; automatic studio speech segment extraction; fuzzy c-means algorithm; interview speech; multimedia fusion; multimedia markup language; reporter speech; spoken document retrieval; television news broadcasts; video frames; video information; Audio recording; Automatic speech recognition; Data mining; Digital multimedia broadcasting; Indexing; Information retrieval; Markup languages; Multimedia communication; Speech recognition; TV broadcasting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1200073
Filename :
1200073
Link To Document :
بازگشت