• DocumentCode
    1815258
  • Title

    Clustering genes using gene expression and text literature data

  • Author

    Yang, Chengyong ; Zeng, Erliang ; Li, Tao ; Narasimhan, Giri

  • Author_Institution
    Bioinformatics Res. Group, Florida Int. Univ., Miami, FL, USA
  • fYear
    2005
  • fDate
    8-11 Aug. 2005
  • Firstpage
    329
  • Lastpage
    340
  • Abstract
    Clustering of gene expression data is a standard technique used to identify closely related genes. In this paper, we develop a new clustering algorithm, MSC (Multi-Source Clustering), to perform exploratory analysis using two or more diverse sources of data. In particular, we investigate the problem of improving the clustering by integrating information obtained from gene expression data with knowledge extracted from biomedical text literature. In each iteration of algorithm MSC, an EM-type procedure is employed to bootstrap the model obtained from one data source by starting with the cluster assignments obtained in the previous iteration using the other data sources. Upon convergence, the two individual models are used to construct the final cluster assignment. We compare the results of algorithm MSC for two data sources with the results obtained when the clustering is applied on the two sources of data separately. We also compare it with that obtained using the feature level integration method that performs the clustering after simply concatenating the features obtained from the two data sources. We show that the z-scores of the clustering results from MSC are better than that from the other methods. To evaluate our clusters better, function enrichment results are presented using terms from the Gene Ontology database. Finally, by investigating the success of motif detection programs that use the clusters, we show that our approach integrating gene expression data and text data reveals clusters that are biologically more meaningful than those identified using gene expression data alone.
  • Keywords
    biology computing; genetics; ontologies (artificial intelligence); statistical analysis; Gene Ontology database; MSC; biomedical text literature; cluster assignment; clustering algorithm; clustering genes; data source; feature level integration method; gene expression data; integrating information; iteration; motif detection program; multisource clustering; text literature data; Algorithm design and analysis; Bioinformatics; Clustering algorithms; Computer science; Data analysis; Data mining; Databases; Gene expression; Information analysis; Performance analysis; Biological Literature; Gene Expression Data; Multi-Source Clustering; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems Bioinformatics Conference, 2005. Proceedings. 2005 IEEE
  • Print_ISBN
    0-7695-2344-7
  • Type

    conf

  • DOI
    10.1109/CSB.2005.23
  • Filename
    1498034