مرکز منطقه ای اطلاع رساني علوم و فناوري - Cluster validity analysis using subsampling

DocumentCode :

2285070

Title :

Cluster validity analysis using subsampling

Author :

Abul, Osman ; Lo, Anthony ; Alhajj, Reda ; Polat, Faruk ; Barker, Ken

Author_Institution :

Dept. Comput. Sci., Calgary Univ., Alta., Canada

Volume :

fYear :

2003

fDate :

5-8 Oct. 2003

Firstpage :

1435

Abstract :

Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods.

Keywords :

data analysis; generalisation (artificial intelligence); learning (artificial intelligence); pattern clustering; sampling methods; statistical analysis; cluster validity analysis; confidence estimation; dataset; generalization; subsampling stability analysis; supervised classifiers; test sets; training set; Clustering algorithms; Computer science; Humans; Organizing; Pattern analysis; Sampling methods; Stability analysis; Testing; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Systems, Man and Cybernetics, 2003. IEEE International Conference on

ISSN :

1062-922X

Print_ISBN :

0-7803-7952-7

Type :

conf

DOI :

10.1109/ICSMC.2003.1244614

Filename :

1244614

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2285070