Performance of distributed apriori algorithms on a computational grid

Author

Rawat, Sandeep Singh ; Rajamani, Lakshmi

Author_Institution

Guru Nanak Inst. of Technol., Hyderabad, India

fYear

2009

fDate

7-11 Dec. 2009

Firstpage

163

Lastpage

167

Abstract

When large data repositories are coupled with geographic distribution of data, users and systems, it is necessary to combine different technologies for implementing high-performance distributed knowledge discovery systems. On the other hand, computational grid is emerging as a very promising infrastructure for high-performance distributed computing. Grid applications such as astronomy, chemistry, engineering, climate studies, geology, oceanography, ecology, physics, biology, health sciences and computer science often involve large amounts of computing and/or data. For these reasons, we think grids can offer an effective support to the implementation and use of parallel and distributed data mining systems. This paper describes development of parallel and distributed prior algorithm on grid environment. Apriori algorithm along with FP-growth (frequent pattern growth) is implemented on grid network in each grid node, which finds the local support counts and prunes all infrequent item sets. After completing local pruning, each grid node broadcasts messages containing all the remaining frequent patterns to the coordinator. We have compared the output of conventional method of apriori algorithm with FP-tree in both homogenous and heterogeneous environments. Practical datasets are large in nature and taken from the UCI machine repository and are related to adult, mushroom, and letter recognition, are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper. In future the security issue among different local datasets and the huge communication cost in data migration can be considered.

Keywords

data mining; grid computing; parallel processing; pattern classification; UCI machine repository; computational grid; data migration; data repositories; distributed apriori algorithms; distributed computing; distributed data mining systems; distributed knowledge discovery systems; frequent pattern growth; geographic distribution; parallel data mining systems; security; Application software; Astrochemistry; Astronomy; Biology computing; Chemical technology; Data engineering; Distributed computing; Grid computing; Marine technology; Space technology; Association rules; Data mining; Grid computing; High performance;

fLanguage

English

Publisher

ieee

Conference_Titel

Services Computing Conference, 2009. APSCC 2009. IEEE Asia-Pacific

Conference_Location

Singapore

Print_ISBN

978-1-4244-5338-2

Electronic_ISBN

978-1-4244-5336-8

Type

conf

DOI

10.1109/APSCC.2009.5394128

Filename

5394128