Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets

Author

Hu, Zhen ; Bhatnagar, Raj

Author_Institution

Dept. of Comput. Sci., Univ. of Cincinnati, Cincinnati, OH, USA

fYear

2010

fDate

13-17 Dec. 2010

Firstpage

236

Lastpage

245

Abstract

The concept of Triclusters has been investigated recently in the context of two relational datasets that share labels along one of the dimensions. By simultaneously processing two datasets to unveil triclusters, new useful knowledge and insights can be obtained. However, some recently reported methods are either closely linked to specific problems or constrain datasets to have some specific distributions. Algorithms for generating triclusters whose cell-values demonstrate simple well known statistical properties, such as upper bounds on standard deviations, are needed for many applications. In this paper we present a 3-Clustering algorithm that searches for meaningful combinations of biclusters in two related datasets. The algorithm can handle situations involving: (i) datasets in which a few data objects may be present in only one dataset and not in both datasets, (ii) the two datasets may have different numbers of objects and/or attributes, and (iii) the cell-value distributions in two datasets may be different. In our formulation the cell-values of each selected tricluster, formed by two independent biclusters, are such that the standard deviations in each bicluster obeys an upper bound and the sets of objects in the two biclusters overlap to the maximum possible extent. We present validation of our algorithm by presenting the properties of the 3-Clusters discovered from a synthetic dataset and from a real world cross-species genomic dataset. The results of our algorithm unveil interesting insights for the cross-species genomic domain.

Keywords

data mining; pattern clustering; search problems; statistical analysis; cell-value distributions; data mining; low variance cluster; real valued dataset; relational datasets; standard deviation; statistical property; triclusters; Co-clustering; Triclusters;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining (ICDM), 2010 IEEE 10th International Conference on

Conference_Location

Sydney, NSW

ISSN

1550-4786

Print_ISBN

978-1-4244-9131-5

Electronic_ISBN

1550-4786

Type

conf

DOI

10.1109/ICDM.2010.77

Filename

5693977