High performance clustering for large data warehouses using peer-to-peer genetic algorithm

Author

Shah, M. Nauman ; Mahmood, Rafia

Author_Institution

Nat. Univ. of Comput. & Emerging Sci., FAST-NU, Islamabad, Pakistan

fYear

2003

fDate

8-9 Dec. 2003

Firstpage

420

Lastpage

423

Abstract

High volumes of data pose a challenge to the scalability of data mining algorithms. Dividing this data into equal partitions and processing it in parallel naturally becomes a choice. Peer-to-peer computing exposes a bright source for exploiting parallelism and maintaining scale-up capability. We consider parallelism in genetic algorithms while computing the fitness of the population individuals (chromosomes). This strategy has an edge over its counterpart, that is, parallelism in genetic operators, because genetic operators tend to be computationally cheap. Simply speaking this scheme supports large data sets, that is. larger the data size, larger will be the degree of parallelism achieved.

Keywords

data mining; data warehouses; genetic algorithms; parallel algorithms; pattern clustering; peer-to-peer computing; chromosomes; data mining; genetic algorithm; high performance clustering; large data warehouses; parallel algorithms; peer-to-peer computing; population fitness; scalability; Biological cells; Clustering algorithms; Concurrent computing; Data mining; Data warehouses; Genetic algorithms; Parallel processing; Partitioning algorithms; Peer to peer computing; Scalability;

fLanguage

English

Publisher

ieee

Conference_Titel

Multi Topic Conference, 2003. INMIC 2003. 7th International

Print_ISBN

0-7803-8183-1

Type

conf

DOI

10.1109/INMIC.2003.1416762

Filename

1416762