Title :
Implementation and Performance Evaluation of a Hybrid Distributed System for Storing and Processing Images from the Web
Author :
Krishna, Murali ; Kannan, Balaji ; Ramani, Anand ; Sathish, Sriram J.
Author_Institution :
Amazon, Bangalore, India
fDate :
Nov. 30 2010-Dec. 3 2010
Abstract :
Multimedia applications have undergone tremendous changes in the recent past that they have called for a scalable and reliable processing and storage framework. Image processing algorithms such as pornographic content detection becomes a lot more challenging in terms of accuracy, recall, and speed when run on billions of images. This paper presents the design and implementation of a hybrid-distributed architecture that uses Hadoop distributed file system for storage and Map/Reduce paradigm for processing images, crawled from the web. This architecture combines the power of Hadoop framework when there is a need to parallelize the task, as Map/Reduce jobs and uses stand alone crawler nodes to fetch relevant contents from the web. Evaluations on real world web data indicate that the system can store and process billions of images in few hours.
Keywords :
Internet; distributed databases; image processing; multimedia systems; network operating systems; Hadoop distributed file system; Map-Reduce paradigm; Web image processing; hybrid distributed architecture; hybrid distributed system; image storage framework; multimedia application; performance evaluation; pornographic content detection; reliable processing; scalable processing; world Web data; Crawlers; Feeds; Image storage; Multimedia communication; Pipelines; Scalability; Distributed Multimedia; HDFS; Hadoop; Map-Reduce;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on
Conference_Location :
Indianapolis, IN
Print_ISBN :
978-1-4244-9405-7
Electronic_ISBN :
978-0-7695-4302-4
DOI :
10.1109/CloudCom.2010.116