Title :
Scalable fuzzy neighborhood DBSCAN
Author :
Parker, Jonathon K. ; Hall, Lawrence O. ; Kandel, Abraham
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
The majority of data available in most disciplines is unlabeled and unclassified. The amount of data is often massive, hence scalable processing methods are required. One method of providing structure to unlabeled data is to group it by clustering. Density based methods discover the number of clusters. Additionally, the shape of such clusters can also be irregular. In this paper we examine a version of DBSCAN modified to use fuzzy membership functions (FN-DBSCAN). FN-DBSCAN was implemented using the WEKA data mining framework and a scalable technique (SFN-DBSCAN) is simulated using the framework. Experimental results show that SFN-DBSCAN can be over three times as fast as FN-DBSCAN for small to medium size data. The resulting cluster assignments match at an average rate of 90% when compared with assignments by FN-DBSCAN. SFN-DBSCAN´s speed increases proportionally with respect to the number of subsets, but cluster assignment concurrence between FN-DBSCAN and SFN-DBSCAN suffers from degradation as the number of subsets increase.
Keywords :
data mining; fuzzy set theory; WEKA data mining framework; density based method; fuzzy membership functions; scalable fuzzy neighborhood DBSCAN; scalable processing; scalable technique; Accuracy; Classification algorithms; Clustering algorithms; Complexity theory; Fuzzy logic; Noise; Runtime;
Conference_Titel :
Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6919-2
DOI :
10.1109/FUZZY.2010.5584527