Title :
Parallel feature selection algorithm based on rough sets and particle swarm optimization
Author :
Adamczyk, Mateusz
Author_Institution :
Fac. of Math., Inf., & Mech., Univ. of Warsaw, Warsaw, Poland
Abstract :
The aim of this paper is to propose a new method of solving feature selection problem. Foundations of presented algorithm lie in the theory of rough sets. Feature selection methods based on rough sets have been used with success in many data mining problems, but their weakness is their computational complexity. In order to overcome the above-mentioned problem, researches used diverse approximation techniques. This paper presents a new approach to approximation of reducts. Particle swarm optimization (PSO) is a stochastic metaheuristic similar to genetic algorithms. The idea is to see each potential solution as a particle with certain velocity flying through the problem space. The PSO finds optimal solutions by interactions of individuals in population. The main advantage of the PSO over genetic algorithms, is that PSO does not require complex operators such as crossover or mutation. It only uses simple mathematical operators to update position and velocity of each particle, which makes PSO computationally inexpensive in terms of both memory and runtime. The presented feature selection algorithm treats each feature subset as separate particle. Optimal subset, in terms of selected measure, is discovered as particles fly within the problem space. In order to speed up calculations and balance usage of hardware resources (processors, memory), parallel asynchronous version of PSO is applied. It is based on scheduling calculations of complex fitness function on slave processors, while the main one is responsible for updating particles data and checking algorithm´s convergence. Applied approach scales well and provides balanced usage of given resources even if it is not feasible to use the same computational power of every processor, for instance when used resources are not homogeneous. Proposed method was tested on selected set of data sets from the UCI repository and results were compared to some of the classical algorithms.
Keywords :
approximation theory; computational complexity; data mining; feature selection; mathematical operators; particle swarm optimisation; rough set theory; scheduling; stochastic processes; UCI repository; complex fitness function; computational complexity; data mining problems; mathematical operators; parallel asynchronous version; parallel feature selection algorithm; particle swarm optimization; reduct approximation; rough sets; scheduling calculations; slave processors; stochastic metaheuristic; Accuracy; Equations; Particle swarm optimization; Program processors; Rough sets; Sociology; Statistics; Feature Selection; Parallel Asynchronous Particle Swarm Optimization; Particle Swarm Optimization; Rough Sets;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on
Conference_Location :
Warsaw