Title :
A new cluster validity measure for bioinformatics relational datasets
Author :
Popescu, Mihail ; Bezdek, James C. ; Keller, James M. ; Havens, Timothy C. ; Huband, Jacalyn M.
Author_Institution :
Heath Manage. & Med. Inf. Dept., Univ. of Missouri, Columbia, MO
Abstract :
Many important applications in biology have underlying datasets that are relational, that is, only the (dis)similarity between biological objects (amino acid sequences, gene expression profiles, etc.) is known and not their feature values in some feature space. Examples of such relational datasets are the gene similarity matrices obtained from BLAST, gene expression data, or gene ontology (GO) similarity measures. Once a relational dataset is obtained, a common question asked is how many groups of objects are represented in the original dataset. The answer to this question is usually obtained by employing a clustering algorithm and a cluster validity measure. In this article we describe a cluster validity measure for non-Euclidean relational fuzzy c-means that is based on the correlation between a relation induced on the data by the cluster memberships and the original relational data. This validity measure can be applied to partitions made by any fuzzy relational clustering algorithm. We illustrate our measure by validating clusters in several dissimilarity matrices for a set of 194 gene products obtained using BLAST and GO similarities.
Keywords :
biology computing; data analysis; fuzzy set theory; pattern clustering; bioinformatics relational datasets; cluster validity measure; fuzzy relational clustering algorithm; gene expression data; gene ontology; nonEuclidean relational fuzzy c-means; Amino acids; Bioinformatics; Clustering algorithms; Extraterrestrial measurements; Fuzzy sets; Fuzzy systems; Gene expression; Ontologies; Partitioning algorithms; Sequences;
Conference_Titel :
Fuzzy Systems, 2008. FUZZ-IEEE 2008. (IEEE World Congress on Computational Intelligence). IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-1818-3
Electronic_ISBN :
1098-7584
DOI :
10.1109/FUZZY.2008.4630450