Title :
Long-Term Incremental Web-Supervised Learning of Visual Concepts via Random Savannas
Author :
Ewerth, Ralph ; Ballafkir, Khalid ; Mühling, Markus ; Seiler, Dominik ; Freisleben, Bernd
Author_Institution :
Dept. of Math. & Comput. Sci., Univ. of Marburg, Marburg, Germany
Abstract :
The idea of using image and video data available in the World-Wide Web (WWW) as training data for classifier construction has received some attention in the past few years. In this paper, we present a novel incremental and scalable web-supervised learning system that continuously learns appearance models for image categories with heterogeneous appearances and improves these models periodically. Simply specifying the name of the concept that has to be learned initializes the proposed system, and there is no further supervision afterwards. Textual and visual information on web sites are used to filter out irrelevant and misleading training images. To obtain a robust, flexible, and updatable way of learning, a novel learning framework is presented that relies on clustering in order to identify visual subclasses before using an ensemble of random forests, called random savanna, for subclass learning. Experimental results demonstrate that the proposed web-supervised learning approach outperforms a support vector machine (SVM), while at the same time being simply parallelizable in the training and testing phases.
Keywords :
Web sites; image classification; information filtering; learning (artificial intelligence); pattern clustering; random processes; text analysis; video retrieval; WWW; Web sites; World Wide Web; appearance models; classifier construction; clustering; heterogeneous appearances; image categories; image data; irrelevant training image filter; long-term incremental Web-supervised learning; misleading training image filter; random forest ensembles; random savannas; scalable Web-supervised learning system; subclass learning; textual information; training data; video data; visual concepts; visual information; visual subclass identification; Google; Semantics; Support vector machines; Training; Training data; Visualization; World Wide Web; Image classification; incremental learning; random forest; random savanna; web-supervised learning;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2012.2186956