DocumentCode
1714884
Title
An improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets
Author
Ye Zonglin ; Cao Hui ; Wang Miaomiao ; Zhang Yanbin
Author_Institution
State Key Lab. of Electr. Insulation & Power Equip., Xi´an Jiaotong Univ., Xi´an, China
fYear
2013
Firstpage
3552
Lastpage
3555
Abstract
This paper proposes an improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets. Firstly, the proposed method selects the samples from the original dataset to obtain a sampling dataset. Secondly, the density based spatial clustering of applications with noise (DBSCAN) with the genetic algorithm is performed on the sampling dataset to determine the neighborhood of a given radius (Eps) and the minimum number (MinPts), where the Minkowski score is used as the fitness function. Finally, the obtained MinPts and Eps are transformed by considering the scales of the original dataset and the sampling dataset. With the new parameters, DBSCAN is performed on the original dataset. Three datasets of UCI Machine Learning Repository are used in the experiments. The experimental results verify that the proposed method has higher clustering capability and the selection of the parameters is easier and more effective.
Keywords
genetic algorithms; learning (artificial intelligence); pattern clustering; user interfaces; DBSCAN; Eps; MinPts; Minkowski score; UCI Machine Learning Repository; data sampling; density based spatial clustering of applications with noise; density-based cluster analysis method; fitness function; genetic algorithm; large-scale datasets; Algorithm design and analysis; Clustering algorithms; Educational institutions; Genetic algorithms; Machine learning algorithms; Optimization; Partitioning algorithms; Cluster analysis; DBSCAN; Data sampling; Genetic algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Control Conference (CCC), 2013 32nd Chinese
Conference_Location
Xi´an
Type
conf
Filename
6640036
Link To Document