مرکز منطقه ای اطلاع رساني علوم و فناوري - Generating web-based corpora for video transcripts categorization

Title of article :

Generating web-based corpora for video transcripts categorization

Author/Authors :

Perea-Ortega، نويسنده , , José M. and Montejo-Rلez، نويسنده , , Arturo and Teresa Martيn-Valdivia، نويسنده , , M. and Alfonso Ureٌa-Lَpez، نويسنده , , L.، نويسنده ,

Issue Information :

روزنامه با شماره پیاپی سال 2013

Pages :

From page :

337

To page :

344

Abstract :

This paper proposes the use of Internet as a rich source of information in order to generate learning corpora for video transcripts categorization systems. Our main goal in this work has been to study the behavior of different learning corpora generated from the Internet and analyze some of their features. Specifically, Wikipedia, Google and the blogosphere have been employed to generate these learning corpora, using the VideoCLEF 2008 track as the evaluation framework for the different experiments carried out. Based on this evaluation framework, we conclude that the proposed approach is a promising strategy for the video classification task using the transcripts of the videos. The different sizes of the corpora generated could lead to believe that better results are achieved when the corpus size is larger, but we demonstrate that this feature may not always be a reliable indicator of the behavior of the learning corpus. The obtained results show that the integration of knowledge from the blogosphere or Google allows generating more reliable corpora for this task than those based on Wikipedia.

Keywords :

Web-based corpora generation , Automatic speech recognition (ASR) , Video transcripts categorization , Video tagging

Journal title :

Expert Systems with Applications

Serial Year :

2013

Journal title :

Expert Systems with Applications

Record number :

2352935

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2352935