DocumentCode :
3196716
Title :
A Large Scale Concept Ontology for News Stories: Empirical Methods, Analysis, and Improvements
Author :
Kender, John R.
Author_Institution :
Columbia Univ., New York
fYear :
2007
fDate :
2-5 July 2007
Firstpage :
544
Lastpage :
547
Abstract :
We analyze the completeness, accuracy, and utility of the largest known annotation ground truth database for video news stories, comprising nearly 680K individual tags on 62 K shots using a vocabulary of 449 semantic concepts. We find the vocabulary is not yet mature: it does not follow Zipf\´s law, although concepts derived from vocabulary intersection do so more closely. We find that because many concepts are sparse, the best method for exposing the implicit semantic space is to use the distance measure G2, complete link clustering, and a heuristic distance cutoff based on shot cluster evolution history; this yields 12 well-defined major shot categories. Because the database is errorful, we derive a model for annotator error, and using it, we extract a natural concept subsumption ontology from the database, including some counter-intuitive relationships. Again using intersection, we demonstrate a method for identifying "missing" subconcepts from superconcepts. Lastly, we note that without superconcepts, shot clustering fails. These methods are unbiased by any prior semantic assumptions, and only depend on a statistically sufficient body of ground truth. They are therefore applicable to any other specific video retrieval domain.
Keywords :
ontologies (artificial intelligence); video retrieval; visual databases; annotation ground truth database; counter-intuitive relationships; heuristic distance cutoff; large scale concept ontology; link clustering; natural concept subsumption ontology; semantic concepts; video news stories; video retrieval domain; Computer science; Frequency; Government; History; Indexing; Large-scale systems; Multimedia communication; Multimedia databases; Ontologies; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2007 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
1-4244-1016-9
Electronic_ISBN :
1-4244-1017-7
Type :
conf
DOI :
10.1109/ICME.2007.4284707
Filename :
4284707
Link To Document :
بازگشت