مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallelization of association rule mining: Survey

Abstract :

In todays big data era, all modern applications are generating and collecting large amount of data. As a result, data mining is encountering new challenges and opportunities to make algorithms such that, this voluminous data can be effectively and efficiently transformed into actionable knowledge . Traditional algorithms were designed to run sequentially over a single machine. But, as the volume of data increases computational cost associated with its processing also increases. This causes problems in analysing data on a single sequential machine and instead of assisting in data analysis, the processor serve more like a bottleneck. Parallel and distributed approaches improve the performance in terms of computational cost as well as scalability but experience some limitations during load balancing, data partitioning, job assignment, monitoring etc. MapReduce, a parallel programming model is a new concept which provides seemingly unlimited computing power, cheap storage as well as, can overcome above limitations. This makes it a topic of upcoming research interest. A detailed literature review of some existing methods is given along with their pros and cons.