DocumentCode
3222932
Title
Performance of distributed apriori algorithms on a computational grid
Author
Rawat, Sandeep Singh ; Rajamani, Lakshmi
Author_Institution
Guru Nanak Inst. of Technol., Hyderabad, India
fYear
2009
fDate
7-11 Dec. 2009
Firstpage
163
Lastpage
167
Abstract
When large data repositories are coupled with geographic distribution of data, users and systems, it is necessary to combine different technologies for implementing high-performance distributed knowledge discovery systems. On the other hand, computational grid is emerging as a very promising infrastructure for high-performance distributed computing. Grid applications such as astronomy, chemistry, engineering, climate studies, geology, oceanography, ecology, physics, biology, health sciences and computer science often involve large amounts of computing and/or data. For these reasons, we think grids can offer an effective support to the implementation and use of parallel and distributed data mining systems. This paper describes development of parallel and distributed prior algorithm on grid environment. Apriori algorithm along with FP-growth (frequent pattern growth) is implemented on grid network in each grid node, which finds the local support counts and prunes all infrequent item sets. After completing local pruning, each grid node broadcasts messages containing all the remaining frequent patterns to the coordinator. We have compared the output of conventional method of apriori algorithm with FP-tree in both homogenous and heterogeneous environments. Practical datasets are large in nature and taken from the UCI machine repository and are related to adult, mushroom, and letter recognition, are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper. In future the security issue among different local datasets and the huge communication cost in data migration can be considered.
Keywords
data mining; grid computing; parallel processing; pattern classification; UCI machine repository; computational grid; data migration; data repositories; distributed apriori algorithms; distributed computing; distributed data mining systems; distributed knowledge discovery systems; frequent pattern growth; geographic distribution; parallel data mining systems; security; Application software; Astrochemistry; Astronomy; Biology computing; Chemical technology; Data engineering; Distributed computing; Grid computing; Marine technology; Space technology; Association rules; Data mining; Grid computing; High performance;
fLanguage
English
Publisher
ieee
Conference_Titel
Services Computing Conference, 2009. APSCC 2009. IEEE Asia-Pacific
Conference_Location
Singapore
Print_ISBN
978-1-4244-5338-2
Electronic_ISBN
978-1-4244-5336-8
Type
conf
DOI
10.1109/APSCC.2009.5394128
Filename
5394128
Link To Document