• DocumentCode
    531815
  • Title

    Wikipedia based semantic metadata annotation of audio transcripts

  • Author

    Paci, Giulio ; Pedrazzi, Giorgio ; Turra, Roberta

  • Author_Institution
    CINECA - Consorzio Interuniversitario, Casalecchio di Reno, Italy
  • fYear
    2010
  • fDate
    12-14 April 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.
  • Keywords
    Internet; audio signal processing; speech recognition; video signal processing; ASR transcript; Papyrus project; WER; Wikipedia; audio transcript; automatic speech recognition; broadcast video; semantic metadata annotation; video item annotation; word error rate; Context; Electronic publishing; Encyclopedias; Internet; Semantics; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on
  • Conference_Location
    Desenzano del Garda
  • Print_ISBN
    978-1-4244-7848-4
  • Electronic_ISBN
    978-88-905328-0-1
  • Type

    conf

  • Filename
    5617667