• DocumentCode
    1714884
  • Title

    An improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets

  • Author

    Ye Zonglin ; Cao Hui ; Wang Miaomiao ; Zhang Yanbin

  • Author_Institution
    State Key Lab. of Electr. Insulation & Power Equip., Xi´an Jiaotong Univ., Xi´an, China
  • fYear
    2013
  • Firstpage
    3552
  • Lastpage
    3555
  • Abstract
    This paper proposes an improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets. Firstly, the proposed method selects the samples from the original dataset to obtain a sampling dataset. Secondly, the density based spatial clustering of applications with noise (DBSCAN) with the genetic algorithm is performed on the sampling dataset to determine the neighborhood of a given radius (Eps) and the minimum number (MinPts), where the Minkowski score is used as the fitness function. Finally, the obtained MinPts and Eps are transformed by considering the scales of the original dataset and the sampling dataset. With the new parameters, DBSCAN is performed on the original dataset. Three datasets of UCI Machine Learning Repository are used in the experiments. The experimental results verify that the proposed method has higher clustering capability and the selection of the parameters is easier and more effective.
  • Keywords
    genetic algorithms; learning (artificial intelligence); pattern clustering; user interfaces; DBSCAN; Eps; MinPts; Minkowski score; UCI Machine Learning Repository; data sampling; density based spatial clustering of applications with noise; density-based cluster analysis method; fitness function; genetic algorithm; large-scale datasets; Algorithm design and analysis; Clustering algorithms; Educational institutions; Genetic algorithms; Machine learning algorithms; Optimization; Partitioning algorithms; Cluster analysis; DBSCAN; Data sampling; Genetic algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control Conference (CCC), 2013 32nd Chinese
  • Conference_Location
    Xi´an
  • Type

    conf

  • Filename
    6640036