Author :
Faceli, Katti ; de Souto, Marcilio C. P. ; de Carvalho, Andre C. P. L. F.
Abstract :
One of the advantages of Pareto-based multi-objective genetic algorithms for clustering, when compared to classical clustering algorithms, is that, instead of a single solution (partition), they give as an output a set of solutions (approximation of the Pareto front or Pareto front, for short). However, such a set could be very large (e.g., hundreds of partitions) and, consequently, difficult to be analyzed manually. We present a selection strategy, based on the corrected Rand index, that aims at recommending, as final solution for Pareto-based multi-objective genetic algorithm approaches, a subset of partitions from the Pareto front. This subset should be much smaller than the the latter and, at the same time, keep the quality and the diversity of the partitions. In order to test our strategy, we develop a study of case in which we apply the strategy to the sets of solutions obtained with the multi-objective clustering ensemble algorithm (MOCLE) in the context of several data sets.
Keywords :
approximation theory; genetic algorithms; pattern clustering; Pareto front approximation; Pareto-based multi-objective genetic algorithms; corrected Rand index; multi-objective clustering approaches; Algorithm design and analysis; Approximation algorithms; Clustering algorithms; Computer networks; Data mining; Genetic algorithms; Information analysis; Neural networks; Partitioning algorithms; Testing; Ensemble; Multi-Objective Clustering;