• DocumentCode
    2076019
  • Title

    Seeding Cluster Centers of K-means Clustering through Median Projection

  • Author

    Suresh, Lalith ; Simha, Jay B. ; Velur, Rajappa

  • Author_Institution
    Dept. of CSE, Cambridge Inst. of Technol., Bangalore, India
  • fYear
    2010
  • fDate
    15-18 Feb. 2010
  • Firstpage
    217
  • Lastpage
    222
  • Abstract
    K-means Clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm. This algorithm uses predefined number of clusters as input. The original algorithm is based on random selection of cluster centers and iteratively improving the results. However there are two major limitations in this approach. First, the need for number of clusters in advance, is difficult since the underlying structure is not known. Second selection of cluster centers randomly in local optima. In addition most of the K-means implementations are memory based structures limiting the data size. In this work, a novel approach to seeding the clusters with the latent data structure is proposed. This is expected to minimize: The need for number of clusters apriory, thereby reducing time for convergence by providing near optimal cluster centers. In addition the implementation of the algorithm is done in SQL, to provide the disk based solution, to handle large data sets, which cannot fit into memory. The proposed solution was tested on both row store and column store databases. The results are promising and the work is under progress to test in different domains.
  • Keywords
    SQL; data structures; pattern clustering; K-means clustering; SQL; column store databases; disk based solution; latent data structure; median projection; random selection; row store databases; seeding cluster centers; Algorithm design and analysis; Clustering algorithms; Competitive intelligence; Convergence; Databases; Intelligent structures; Iterative algorithms; Partitioning algorithms; Software systems; Testing; Clustering; Median projection and Median Selection; Multidimensional data; SQL; prediction model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on
  • Conference_Location
    Krakow
  • Print_ISBN
    978-1-4244-5917-9
  • Type

    conf

  • DOI
    10.1109/CISIS.2010.133
  • Filename
    5447429