• DocumentCode
    3426617
  • Title

    An approximation algorithm for finding skeletal points for density based clustering approaches

  • Author

    Yeganeh, Soheil Hassas ; Habibi, Jafar ; Abolhassani, Hassan ; Tehrani, Mahdi Abbaspour ; Esmaelnezhad, Jamshid

  • Author_Institution
    Comput. Eng. Dept., Sharif Univ. of Techonology, Tehran
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    403
  • Lastpage
    410
  • Abstract
    Clustering is the problem of finding relations in a data set in an supervised manner. These relations can be extracted using the density of a data set, where density of a data point is defined as the number of data points around it. To find the number of data points around another point, region queries are adopted. Region queries are the most expensive construct in density based algorithm, so it should be optimized to enhance the performance of density based clustering algorithms specially on large data sets. Finding the optimum set of region queries to cover all the data points has been proven to be NP-complete. This optimum set is called the skeletal points of a data set. In this paper, we proposed a generic algorithms which fires region queries at most 6 times the optimum number of region queries (has 6 as approximation factor). Also, we have extend this generic algorithm to create a DBSCAN (the most wellknown density based algorithm) derivative, named ADBSCAN. Presented experimental results show that ADBSCAN has a better approximation to DBSCAN than the DBRS-H (the most well-known randomized density based algorithm) in terms of performance and quality of clustering, specially for large data sets.
  • Keywords
    computational complexity; optimisation; pattern clustering; ADBSCAN; DBRS-H; DBSCAN; NP-complete; approximation algorithm; data points; data set; density based clustering approaches; generic algorithms; randomized density based algorithm; region queries; skeletal points; Approximation algorithms; Clustering algorithms; Databases; Euclidean distance; Extraterrestrial measurements; Impedance; Information retrieval; Information systems; Logic; Web sites; Approximation Algorithms; Clustering; Data Mining; Spatial Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining, 2009. CIDM '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2765-9
  • Type

    conf

  • DOI
    10.1109/CIDM.2009.4938678
  • Filename
    4938678