DocumentCode
3681506
Title
Realtime file processing based on Map-Reduce framework
Author
George Cabău;Andrea Timea Sălăgean;Gheorghe Sebestyen-Pal
Author_Institution
Bitdefender, Technical University of Cluj-Napoca, Romania
fYear
2015
Firstpage
537
Lastpage
543
Abstract
Every day we find hundreds of thousands of new malicious samples. Among them there are a lot of clean files. Deciding which file is infected and which is clean requires intensive processing. Handling such volumes of files and extracted metadata demanded a distributed system. Based on MapReduce, a concept proposed by Google and used by many others companies like Yahoo! and Facebook, we developed a file processing system which will try to fulfill our need of sample processing. The system is able to use hardware computing systems with different hardware configuration to run a series of different tasks and will automatically adjust them on every hardware system. We use a cascade of map and reduce tasks for extracting and processing the data and a key-value RAM database as data link between them. In order to be able to prioritize some task over the others, we created an algorithm which will try to favor a task with higher priority in disfavor of a one with lower priority when system runs at full capacity, trying to balance the cost of moving the same data over the network multiple times. Reliability and horizontal scalability are also things that we took into consideration when designing the system. Having one or multiple hardware failures will not affect the system and adding more hardware systems will have a linear impact.
Keywords
"Program processors","Databases","Data mining","Hardware","Random access memory","Reliability","Programming"
Publisher
ieee
Conference_Titel
Intelligent Computer Communication and Processing (ICCP), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/ICCP.2015.7312716
Filename
7312716
Link To Document