Title : 
Wikipedia based semantic metadata annotation of audio transcripts
         
        
            Author : 
Paci, Giulio ; Pedrazzi, Giorgio ; Turra, Roberta
         
        
            Author_Institution : 
CINECA - Consorzio Interuniversitario, Casalecchio di Reno, Italy
         
        
        
        
        
        
            Abstract : 
A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.
         
        
            Keywords : 
Internet; audio signal processing; speech recognition; video signal processing; ASR transcript; Papyrus project; WER; Wikipedia; audio transcript; automatic speech recognition; broadcast video; semantic metadata annotation; video item annotation; word error rate; Context; Electronic publishing; Encyclopedias; Internet; Semantics; Speech recognition;
         
        
        
        
            Conference_Titel : 
Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on
         
        
            Conference_Location : 
Desenzano del Garda
         
        
            Print_ISBN : 
978-1-4244-7848-4
         
        
            Electronic_ISBN : 
978-88-905328-0-1