مرکز منطقه ای اطلاع رساني علوم و فناوري - SAVE: A framework for semantic annotation of visual events

DocumentCode :

2115398

Title :

SAVE: A framework for semantic annotation of visual events

Author :

Lee, Mun Wai ; Hakeem, Asaad ; Haering, Niels ; Zhu, Song-Chun

Author_Institution :

ObjectVideo, Reston, VA

fYear :

2008

fDate :

23-28 June 2008

Firstpage :

Lastpage :

Abstract :

In this paper we propose a framework that performs automatic semantic annotation of visual events (SAVE). This is an enabling technology for content-based video annotation, query and retrieval with applications in Internet video search and video data mining. The method involves identifying objects in the scene, describing their inter-relations, detecting events of interest, and representing them semantically in a human readable and query-able format. The SAVE framework is composed of three main components. The first component is an image parsing engine that performs scene content extraction using bottom-up image analysis and a stochastic attribute image grammar, where we define a visual vocabulary from pixels, primitives, parts, objects and scenes, and specify their spatio-temporal or compositional relations; and a bottom-up top-down strategy is used for inference. The second component is an event inference engine, where the video event markup language (VEML) is adopted for semantic representation, and a grammar-based approach is used for event analysis and detection. The third component is the text generation engine that generates text report using head-driven phrase structure grammar (HPSG). The main contribution of this paper is a framework for an end-to-end system that infers visual events and annotates a large collection of videos. Experiments with maritime and urban scenes indicate the feasibility of the proposed approach.

Keywords :

Internet; content-based retrieval; data mining; video retrieval; Internet video search; SAVE; automatic semantic annotation; bottom-up image analysis; content-based video annotation; event inference engine; grammar-based approach; head-driven phrase structure grammar; image parsing engine; scene content extraction; stochastic attribute image grammar; video data mining; video event markup language; video query; video retrieval; visual events; Content based retrieval; Data mining; Event detection; Humans; Image analysis; Information retrieval; Internet; Layout; Object detection; Search engines;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on

Conference_Location :

Anchorage, AK

ISSN :

2160-7508

Print_ISBN :

978-1-4244-2339-2

Electronic_ISBN :

2160-7508

Type :

conf

DOI :

10.1109/CVPRW.2008.4562954

Filename :

4562954

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2115398