A Novel Biclustering Algorithm for Discovering Value-Coherent Overlapping σ-Biclusters

Author

Das, Chandra ; Maji, Pradipta ; Chattopadhyay, Samiran

Author_Institution

Dept. of Comput. Sci. & Eng., Netaji Subhash Eng. Coll., Kolkata

fYear

2008

fDate

14-17 Dec. 2008

Firstpage

148

Lastpage

156

Abstract

The biclustering method is a very useful tool for analyzing gene expression data when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. It focuses on finding a subset of genes and a subset of experimental conditions that together exhibit coherent behavior. A large number of biclustering algorithms has been developed for analyzing gene expression data. Most of them find exclusive biclusters, which is inappropriate in the biological context. Since biological processes are not independent of each other, many genes participate in multiple different processes. Hence, nonexclusive biclustering algorithms are required for finding highly overlapping biclusters. In this regard, a novel overlapping biclustering algorithm is presented here to find overlapping biclusters of larger volume with mean squared residue lower than a given threshold. The proposed method consists of two phases. First, a set of highly coherent seeds is generated based on two-way k-medoids algorithm, where mutual information is used as a similarity measure instead of using Euclidean distance. The seeds are then iteratively adjusted (enlarged or degenerated) by adding or removing genes and conditions based on a new quantitative index. In effect, the proposed method provides highly overlapping coherent biclusters with mean squared residue lower than a given threshold. Some quantitative indices are introduced for evaluating the quality of generated biclusters. The quality of biclusters found by the proposed approach is discussed and the results are compared to those reported by existing methods. In general, the proposed approach shows an excellent performance at finding patterns in gene expression data.

Keywords

bioinformatics; data analysis; data mining; genetics; pattern clustering; Euclidean distance; biclustering algorithm; biological process; gene expression data analysis; gene expression measurement; mean squared residue; quantitative index; similarity measure; two-way k-medoids algorithm; value-coherent overlapping delta-bicluster discovery; Algorithm design and analysis; Biological processes; Computer science; Data analysis; Data engineering; Educational institutions; Gene expression; Information analysis; Information technology; Machine intelligence;

fLanguage

English

Publisher

ieee

Conference_Titel

Advanced Computing and Communications, 2008. ADCOM 2008. 16th International Conference on

Conference_Location

Chennai

Print_ISBN

978-1-4244-2962-2

Electronic_ISBN

978-1-4244-2963-9

Type

conf

DOI

10.1109/ADCOM.2008.4760441

Filename

4760441