• DocumentCode
    3196716
  • Title

    A Large Scale Concept Ontology for News Stories: Empirical Methods, Analysis, and Improvements

  • Author

    Kender, John R.

  • Author_Institution
    Columbia Univ., New York
  • fYear
    2007
  • fDate
    2-5 July 2007
  • Firstpage
    544
  • Lastpage
    547
  • Abstract
    We analyze the completeness, accuracy, and utility of the largest known annotation ground truth database for video news stories, comprising nearly 680K individual tags on 62 K shots using a vocabulary of 449 semantic concepts. We find the vocabulary is not yet mature: it does not follow Zipf\´s law, although concepts derived from vocabulary intersection do so more closely. We find that because many concepts are sparse, the best method for exposing the implicit semantic space is to use the distance measure G2, complete link clustering, and a heuristic distance cutoff based on shot cluster evolution history; this yields 12 well-defined major shot categories. Because the database is errorful, we derive a model for annotator error, and using it, we extract a natural concept subsumption ontology from the database, including some counter-intuitive relationships. Again using intersection, we demonstrate a method for identifying "missing" subconcepts from superconcepts. Lastly, we note that without superconcepts, shot clustering fails. These methods are unbiased by any prior semantic assumptions, and only depend on a statistically sufficient body of ground truth. They are therefore applicable to any other specific video retrieval domain.
  • Keywords
    ontologies (artificial intelligence); video retrieval; visual databases; annotation ground truth database; counter-intuitive relationships; heuristic distance cutoff; large scale concept ontology; link clustering; natural concept subsumption ontology; semantic concepts; video news stories; video retrieval domain; Computer science; Frequency; Government; History; Indexing; Large-scale systems; Multimedia communication; Multimedia databases; Ontologies; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2007 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    1-4244-1016-9
  • Electronic_ISBN
    1-4244-1017-7
  • Type

    conf

  • DOI
    10.1109/ICME.2007.4284707
  • Filename
    4284707