• DocumentCode
    2207730
  • Title

    Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets

  • Author

    Hu, Zhen ; Bhatnagar, Raj

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Cincinnati, Cincinnati, OH, USA
  • fYear
    2010
  • fDate
    13-17 Dec. 2010
  • Firstpage
    236
  • Lastpage
    245
  • Abstract
    The concept of Triclusters has been investigated recently in the context of two relational datasets that share labels along one of the dimensions. By simultaneously processing two datasets to unveil triclusters, new useful knowledge and insights can be obtained. However, some recently reported methods are either closely linked to specific problems or constrain datasets to have some specific distributions. Algorithms for generating triclusters whose cell-values demonstrate simple well known statistical properties, such as upper bounds on standard deviations, are needed for many applications. In this paper we present a 3-Clustering algorithm that searches for meaningful combinations of biclusters in two related datasets. The algorithm can handle situations involving: (i) datasets in which a few data objects may be present in only one dataset and not in both datasets, (ii) the two datasets may have different numbers of objects and/or attributes, and (iii) the cell-value distributions in two datasets may be different. In our formulation the cell-values of each selected tricluster, formed by two independent biclusters, are such that the standard deviations in each bicluster obeys an upper bound and the sets of objects in the two biclusters overlap to the maximum possible extent. We present validation of our algorithm by presenting the properties of the 3-Clusters discovered from a synthetic dataset and from a real world cross-species genomic dataset. The results of our algorithm unveil interesting insights for the cross-species genomic domain.
  • Keywords
    data mining; pattern clustering; search problems; statistical analysis; cell-value distributions; data mining; low variance cluster; real valued dataset; relational datasets; standard deviation; statistical property; triclusters; Co-clustering; Triclusters;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2010 IEEE 10th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-9131-5
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2010.77
  • Filename
    5693977