Author_Institution :
Comput. Sci. & Eng. Dept., York Univ., Toronto, ON, Canada
Abstract :
Social Networks promote information sharing between people everywhere and at all times. Mining data produced in this data-rich environment can be extremely useful. Frequent itemset mining plays an important role in mining associations, correlations, sequential patterns, causality, episodes, multidimensional patterns, max-patterns, partial periodicity, emerging patterns, and many other significant data mining tasks in social networks. With the exponential growth of social network data towards a terabyte or more, most of the traditional frequent itemset mining algorithms become ineffective due to either huge resource requirements or large communications overhead. Cloud computing has proved that processing very large datasets over commodity clusters can be done by providing the right programming model. As a parallel programming model, MapReduce, one of most important techniques for cloud computing, has emerged in the mining of datasets of terabyte scale or larger on clusters of computers. In this paper, we propose an efficient frequent itemset mining algorithm, called IMRApriori, based on MapReduce framework which deals with Hadoop cloud, a parallel store and computing platform. The paper demonstrates experimental results to corroborate the theoretical claims.
Keywords :
cloud computing; data mining; parallel programming; social networking (online); Hadoop cloud; IMRApriori algorithm; MapReduce framework; association mining; causality mining; cloud computing; computer clusters; correlation mining; data mining; emerging pattern mining; episode mining; frequent itemset mining algorithm; information sharing; max-pattern mining; multidimensional pattern mining; parallel computing platform; parallel programming model; parallel storage platform; partial-periodicity mining; sequential pattern mining; social network data; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data mining; Itemsets; Social network services; Cloud Computing; Frequent Itemset Mining; MapReduce; Social networks;