• DocumentCode
    1784791
  • Title

    Identifying differentially expressed genes for ordinal phenotypes

  • Author

    Yongkang Kim ; Taesung Park

  • Author_Institution
    Dept. of Stat., Seoul Nat. Univ., Seoul, South Korea
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    193
  • Lastpage
    196
  • Abstract
    A popular goal of microarray analysis is identification of differentially expressed genes (DEGs) between groups, which usually involves two-group comparisons. Many statistical methods have been developed toward this end, such as the t-test and the permutation test. In some cases, more than two groups of interest may be compared, for example, in identification of DEGs across three or four different stages of cancer or across different stages of the cell cycle. Several statistical approaches are also available for such multi-group analyses, including analysis of variance (ANOVA) models. We hypothesized that statistical methods developed for identifying DEGs for ordered groups would provide higher power for such ordered information. Although there are some methods available for ordered group comparisons, they have been rarely applied to the analysis of microarray data. In this paper, we consider various statistical tests for identifying DEGs in comparisons involving more than two groups with ordered information (i.e., cancer stage and cell cycle data). We first consider a constraint ANOVA (CANOVA) model by extending an ANOVA model without using order information, and then employ a proportional odds (PO) model by extending a general logit model. Finally, a simple correlation-based approach is considered. Through extensive simulation studies, we evaluated the performance of the CANOVA, PO, and correlation approaches by comparing the sizes and powers of these methods. The CANOVA, PO, and correlation approaches were applied to real microarray data of The Cancer Genome Atlas (TCGA). We specifically focused on the acute myeloid leukemia (AML) mRNA microarray data set and considered the results of cytogenetic analyses as group information of AML. To identify the genes related to these risk categories, we selected 25 good samples, 25 intermediate samples, and 25 poor samples in the TCGA data set.
  • Keywords
    RNA; bioinformatics; cancer; cellular biophysics; genetics; genomics; lab-on-a-chip; molecular biophysics; statistical analysis; AML mRNA microarray data set; TCGA; The Cancer Genome Atlas; acute myeloid leukemia; analysis of variance models; cell cycle; constraint ANOVA models; cytogenetic analysis; differentially expressed gene identification; general logit model; multigroup analyses; ordinal phenotypes; permutation test; proportional odds model; statistical methods; t-test; Analysis of variance; Analytical models; Bioinformatics; Cancer; Correlation; Data analysis; ANOVA; acute myeloid leukemia (AML); baseline category model; constrained ANOVA; correlation; differentially expressed genes (DEG); microarray; ordinal restriction; proportional odds;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999152
  • Filename
    6999152