• DocumentCode
    834057
  • Title

    A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering

  • Author

    Laszlo, Michael ; Mukherjee, Sumitra

  • Author_Institution
    Graduate Sch. of Comput. & Inf. Sci., Nova Southeastern Univ., Fort Lauderdale, FL, USA
  • Volume
    28
  • Issue
    4
  • fYear
    2006
  • fDate
    4/1/2006 12:00:00 AM
  • Firstpage
    533
  • Lastpage
    543
  • Abstract
    The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k-cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
  • Keywords
    genetic algorithms; pattern clustering; quadtrees; crossover operation; genetic algorithm; hyper-quadtree representation; low-dimensional k-means clustering; squared Euclidean distance; Biological cells; Clustering algorithms; Computational efficiency; Convergence; Euclidean distance; Genetic algorithms; Machine learning; Machine learning algorithms; Partitioning algorithms; Pattern recognition; center selection.; clustering; genetic algorithms; k-means algorithm; optimal partition; quadtrees; Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Information Storage and Retrieval; Models, Genetic; Models, Statistical; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2006.66
  • Filename
    1597111