• DocumentCode
    2997620
  • Title

    Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile

  • Author

    Singh, Nitin Kumar ; Vidyasagar, M. ; White, Michael A.

  • Author_Institution
    Bioeng. Dept., Univ. of Texas at Dallas, Richardson, TX, USA
  • fYear
    2011
  • fDate
    7-8 April 2011
  • Firstpage
    116
  • Lastpage
    120
  • Abstract
    In this paper we study the problem of predicting which genes are likely to have a role in tumor-suppression in lung and colorectal cancer. Mutation frequencies alone cannot serve to differentiate between ´drivers´ (mutations that cause cancer) and ´passengers´ (mutations that are caused by cancer) some other features must be added. Our hypothesis is that the developmental stage gene expression profile provides one such additional feature, that can potentially serve to differentiate between drivers and passengers. The developmental stage gene expression profile refers to the seven-dimensional vector of the gene´s expression, as found in the Unigene database. We focus our attention of two sets of genes: (i) a master set of more than 1,700 genes found to be mutated in breast and colorectal cancer tissues in a famous study by Wood et al., and (ii) a set of nearly 1,800 genes consisting of all genes that have been tested for mutations in lung cancer in the COSMIC database, and have a developmental gene expression profile in the Unigene database. An experimental study by a team led by the third author tested a set of 151 ´CAN-genes´ as identified in, and identified a subset of 65 ´hits´ that resulted in cell proliferation; the rest were classified as ´misses´. The challenge is to reproduce these results at a high level of significance using a classification approach. Using the K-means algorithm, the seven-dimensional expression profile vectors for 1,799 genes were grouped into two clusters, which were properly separated as indicated by a silhouette value of 0.37. The first cluster contained 15 hits and 8 misses out of a total of 626 genes, while the second cluster contained 13 hits and 20 misses out of a total 1,173 genes. The null hypothesis that the known hits and misses occur in equal proportions in both clusters can be rejected at a 1.56% level, while the null hypothesis that both clusters contain an equal proportion of hits can be rejected at a 0.89% level . In short, clustering based on developmental gene expression level provides quite significant discrimination between known experimental outcomes. Going forward, further experiments need to be performed to verify that indeed the first cluster does contain more hits than the second cluster. Also, the approach needs to be extended to other forms of cancer.
  • Keywords
    biological organs; biological techniques; biology computing; cancer; cellular biophysics; genetics; lung; molecular biophysics; molecular configurations; pattern clustering; tumours; 7D gene expression vector; COSMIC database; K-means algorithm; Unigene database; cancer causing mutations; cancer induced mutations; cell proliferation; colorectal cancer; developmental stage gene expression profile; gene expression profile clustering; lung cancer; null hypothesis; tumor suppressing gene prediction; Breast; Cancer; Databases; Gene expression; Humans; Lungs; Mice;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Life Science Systems and Applications Workshop (LiSSA), 2011 IEEE/NIH
  • Conference_Location
    Bethesda, MD
  • Print_ISBN
    978-1-4577-0421-5
  • Type

    conf

  • DOI
    10.1109/LISSA.2011.5754170
  • Filename
    5754170