• DocumentCode
    2461680
  • Title

    A Novel Biclustering Algorithm for Discovering Value-Coherent Overlapping σ-Biclusters

  • Author

    Das, Chandra ; Maji, Pradipta ; Chattopadhyay, Samiran

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Netaji Subhash Eng. Coll., Kolkata
  • fYear
    2008
  • fDate
    14-17 Dec. 2008
  • Firstpage
    148
  • Lastpage
    156
  • Abstract
    The biclustering method is a very useful tool for analyzing gene expression data when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. It focuses on finding a subset of genes and a subset of experimental conditions that together exhibit coherent behavior. A large number of biclustering algorithms has been developed for analyzing gene expression data. Most of them find exclusive biclusters, which is inappropriate in the biological context. Since biological processes are not independent of each other, many genes participate in multiple different processes. Hence, nonexclusive biclustering algorithms are required for finding highly overlapping biclusters. In this regard, a novel overlapping biclustering algorithm is presented here to find overlapping biclusters of larger volume with mean squared residue lower than a given threshold. The proposed method consists of two phases. First, a set of highly coherent seeds is generated based on two-way k-medoids algorithm, where mutual information is used as a similarity measure instead of using Euclidean distance. The seeds are then iteratively adjusted (enlarged or degenerated) by adding or removing genes and conditions based on a new quantitative index. In effect, the proposed method provides highly overlapping coherent biclusters with mean squared residue lower than a given threshold. Some quantitative indices are introduced for evaluating the quality of generated biclusters. The quality of biclusters found by the proposed approach is discussed and the results are compared to those reported by existing methods. In general, the proposed approach shows an excellent performance at finding patterns in gene expression data.
  • Keywords
    bioinformatics; data analysis; data mining; genetics; pattern clustering; Euclidean distance; biclustering algorithm; biological process; gene expression data analysis; gene expression measurement; mean squared residue; quantitative index; similarity measure; two-way k-medoids algorithm; value-coherent overlapping delta-bicluster discovery; Algorithm design and analysis; Biological processes; Computer science; Data analysis; Data engineering; Educational institutions; Gene expression; Information analysis; Information technology; Machine intelligence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing and Communications, 2008. ADCOM 2008. 16th International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4244-2962-2
  • Electronic_ISBN
    978-1-4244-2963-9
  • Type

    conf

  • DOI
    10.1109/ADCOM.2008.4760441
  • Filename
    4760441