Title :
Speech retrieval with video parsing for television news programs
Author :
Meng, Helen M. ; Tang, Xiaoou ; Hui, Pui Yu ; Gao, Xinbo ; Li, Yuk Chi
Author_Institution :
Human-Computer Commun. Lab., Chinese Univ. of Hong Kong, Shatin, China
Abstract :
We have been working on speech retrieval from Chinese (Cantonese) television news programs. The use of automatic speech recognition for audio indexing produces imperfect transcriptions, and recognition errors affect retrieval performance. A news story typically contains a brief report by the anchor person(s) in the studio, as well as news footage from the field. Investigation shows that our recognizer performs better when indexing audio from the studio, compared to that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our study is based on 146 news stories collected from local television Cantonese news programs. We formulated a known-item retrieval task and adopted the average inverse rank (AIR) as our evaluation metric. Retrieval is performed based on syllable bigram units, augmented with skipped syllable bigrams. Retrieval using the entire audio track of each news story gave AIR=0.759. With the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.768
Keywords :
content-based retrieval; database indexing; feature extraction; multimedia databases; speech recognition; video signal processing; Cantonese speech; Chinese speech; audio indexing; audio segments; automatic extraction; automatic speech recognition; average inverse rank; known-item retrieval task; multimedia information retrieval; retrieval evaluation; skipped syllable bigrams; speech retrieval; studio-to-field transitions; syllable bigram units; television news programs; video parsing; Automatic speech recognition; Content based retrieval; Digital audio broadcasting; Digital video broadcasting; Indexing; Information retrieval; Laboratories; Natural languages; TV broadcasting; Video on demand;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
Conference_Location :
Salt Lake City, UT
Print_ISBN :
0-7803-7041-4
DOI :
10.1109/ICASSP.2001.941191