DocumentCode :
1932948
Title :
A Linear DBSCAN Algorithm Based on LSH
Author :
Wu, Yi-pu ; Guo, Jin-jiang ; Zhang, Xue-Jie
Author_Institution :
Yunnan Univ., Kunming
Volume :
5
fYear :
2007
fDate :
19-22 Aug. 2007
Firstpage :
2608
Lastpage :
2614
Abstract :
DBSCAN algorithm is used widely because it can effectively handle noise points and deal with data of any type in clustering. However, it has two inherent limitations: high time complexity O(NlogN) and poor ability in dealing large-scale data. In this paper, a linear DBSCAN based on LSH is proposed. In our algorithm the process of Nearest Neighbor Search is optimized by hashing. Compared with the original DBSCAN algorithm, the time complexity of this improved DBSCAN descends to O(N). Experimentally, this improved DBSCAN makes a significant decrease in the running time while maintaining the Cluster quality of the results. Moreover, the speedup (the running time of original DBSCAN algorithm divided by the running time of improved algorithm) increases with the size and dimension of dataset, and the parameter Eps of our algorithm does not have a strong influence on the clustering result. These improved properties enable DBSCAN to be used in a large scope.
Keywords :
computational complexity; file organisation; pattern clustering; search problems; unsupervised learning; very large databases; clustering method; hashing method; large-scale data; linear DBSCAN algorithm; nearest neighbor search; noise point handling; time complexity; unsupervised learning method; Clustering algorithms; Computer science; Cybernetics; Image segmentation; Information retrieval; Large-scale systems; Machine learning; Machine learning algorithms; Nearest neighbor searches; Partitioning algorithms; Clustering; DBSCAN; LSH; Large-scale data; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
Type :
conf
DOI :
10.1109/ICMLC.2007.4370588
Filename :
4370588
Link To Document :
بازگشت