Title :
Scaling Genetic Programming for data classification using MapReduce methodology
Author :
Al-Madi, Nailah ; Ludwig, Simone
Author_Institution :
Dept. of Comput. Sci., North Dakota State Univ., Fargo, ND, USA
Abstract :
Genetic Programming (GP) is an optimization method that has proved to achieve good results. It solves problems by generating programs and applying natural operations on these programs until a good solution is found. GP has been used to solve many classifications problems, however, its drawback is the long execution time. When GP is applied on the classification task, the execution time proportionally increases with the dataset size. Therefore, to manage the long execution time, the GP algorithm is parallelized in order to speed up the classification process. Our GP is implemented based on the MapReduce methodology (abbreviated as MRGP), in order to benefit from the MapReduce concept in terms of fault tolerance, load balancing, and data locality. MRGP does not only accelerate the execution time of GP for large datasets, it also provides the ability to use large population sizes, thus finding the best result in fewer numbers of generations. MRGP is evaluated using different population sizes ranging from 1,000 to 100,000 measuring the accuracy, scalability, and speedup.
Keywords :
fault tolerance; genetic algorithms; parallel algorithms; pattern classification; resource allocation; GP algorithm; MRGP; MapReduce methodology; data classification; data locality; dataset size; execution time; fault tolerance; genetic programming; load balancing; optimization; Accuracy; Blood; Evolutionary computation; Hadoop; MapReduce; Parallel Processing; data classification; genetic programming;
Conference_Titel :
Nature and Biologically Inspired Computing (NaBIC), 2013 World Congress on
Conference_Location :
Fargo, ND
Print_ISBN :
978-1-4799-1414-2
DOI :
10.1109/NaBIC.2013.6617851