مرکز منطقه ای اطلاع رساني علوم و فناوري - Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

DocumentCode :

3461884

Title :

Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

Author :

Hui, Pui Yu ; Lo, Wai Kif ; Meng, Helen M.

Author_Institution :

Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China

Volume :

fYear :

2003

fDate :

6-10 April 2003

Abstract :

The paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of the AoE-IT Multimedia Repository. We have also developed the multimedia markup language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian mixture models; (iii) a fusion strategy that combines video- and audio-based extraction. The paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.

Keywords :

Gaussian processes; content-based retrieval; fuzzy systems; image retrieval; information retrieval; multimedia databases; speech recognition; video signal processing; Cantonese syllable recognizer; Gaussian mixture models; anchor speech; audio information; audio recordings; automatic studio speech segment extraction; fuzzy c-means algorithm; interview speech; multimedia fusion; multimedia markup language; reporter speech; spoken document retrieval; television news broadcasts; video frames; video information; Audio recording; Automatic speech recognition; Data mining; Digital multimedia broadcasting; Indexing; Information retrieval; Markup languages; Multimedia communication; Speech recognition; TV broadcasting;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-7663-3

Type :

conf

DOI :

10.1109/ICASSP.2003.1200073

Filename :

1200073

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3461884