• DocumentCode
    2555097
  • Title

    A novel parallel hybrid PSO-GA using MapReduce to schedule jobs in Hadoop data grids

  • Author

    Sadasivam, Sudha G. ; Selvaraj, Dharini

  • Author_Institution
    Dept. of CSE, PSG Coll. of Technol., Coimbatore, India
  • fYear
    2010
  • fDate
    15-17 Dec. 2010
  • Firstpage
    377
  • Lastpage
    382
  • Abstract
    Scheduling heterogeneous tasks in a heterogeneous grid environment aims at effectively utilizing the resources and sharing the load among the available resources. Such a task assignment problem is NP-hard. This paper presents a Hybrid Particle Swarm Optimization - Genetic Algorithm (HPSO-GA) for solving the Task Assignment Problem. The novel Particle Swarm Optimization (PSO) implements GA operations such as crossover and mutation in PSO to improve effective resource utilization and complete tasks within deadline. The algorithm aims at distributing load among the heterogeneous resources in the grid environment based on their capacity. Analysis of data and computation intensive applications like web log processing and bioinformatics to achieve optimal performance is time consuming. Hence parallelization of optimization function is essential. Large-scale parallellisation of optimization function must also guarantee efficient communication, load balancing, fault tolerance and reliability. This paper presents a MapReduce HPSO-GA based on MapReduce parallel programming model. The HPSO-GA yields better results than normal PSO, provides better load balancing and resource utilization in grid environment. It identifies the exact node to which a task can be assigned in a Hadoop cluster. Hence, the proposed approach can be used in the resource management system of Hadoop along with Hadoop and system parameters to schedule jobs efficiently in a Hadoop cluster.
  • Keywords
    data analysis; fault tolerant computing; genetic algorithms; grid computing; particle swarm optimisation; pattern clustering; processor scheduling; resource allocation; task analysis; Hadoop cluster; Hadoop data grid; MapReduce; NP-hard problem; data analysis; fault tolerance; genetic algorithm; heterogeneous grid environment; hybrid particle swarm optimization; job scheduling; large-scale parallellisation; load balancing; load distribution; optimization function; reliability; resource management system; resource utilization; task assignment problem; Gallium; Program processors; HPSO-GA; Hadoop; MapReduce; cluster performance; scheduler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Nature and Biologically Inspired Computing (NaBIC), 2010 Second World Congress on
  • Conference_Location
    Fukuoka
  • Print_ISBN
    978-1-4244-7377-9
  • Type

    conf

  • DOI
    10.1109/NABIC.2010.5716346
  • Filename
    5716346