• DocumentCode
    3491485
  • Title

    A Population-Based Incremental Learning approach to microarray gene expression feature selection

  • Author

    Perez, Meir ; Rubin, David M. ; Marwala, Tshilidzi ; Scott, Lesley E. ; Stevens, Wendy

  • Author_Institution
    Dept. of Electr. & Electron. Eng. Technol., Univ. of Johannesburg, Johannesburg, South Africa
  • fYear
    2010
  • fDate
    17-20 Nov. 2010
  • Abstract
    The identification of a differentially expressed set of genes in microarray data analysis is essential, both for novel onco-genic pathway identification, as well as for automated diagnostic purposes. This paper assesses the effectiveness of the Population-Based Incremental Learning (PBIL) algorithm in identifying a class differentiating gene set for sample classification. PBIL is based on iteratively evolving the genome of a search population by updating a probability vector, guided by the extent of class-separability demonstrated by a combination of features. PBIL is compared, both to standard Genetic Algorithm (GA), as well as to an Analysis of Variance (ANOVA). The algorithms are tested on a publically available three-class leukaemia microarray data set (n=72). After running 30 repeats of both GA and PBIL, PBIL was able to find an average feature-space separability of 97.04%, while GA achieved an average class-separability of 96.39%. PBIL also found smaller feature-spaces than GA, (PBIL - 326 genes and GA - 2652) thus excluding a large percentage of redundant features. It also, on average, outperformed the ANOVA approach for n = 2652 (91.62%), q <; 0.05 (94.44%), q <; 0.01 (93.06%) and q <; 0.005 (95.83%). The best PBIL run (98.61%) even outperformed ANOVA for n = 326 and q <; 0.001 (both 97.22%). PBIL´s performance is ascribed to its ability to direct the search, not only towards the optimal solution, but also away from the worst.
  • Keywords
    biology computing; data analysis; genetic algorithms; genomics; learning (artificial intelligence); probability; statistical analysis; PBIL; analysis of variance; average feature-space separability; genetic algorithm; microarray data analysis; microarray gene expression feature selection; oncogenic pathway identification; population-based incremental learning algorithm; probability vector; sample classification; three-class leukaemia microarray data set; Analysis of variance; Bioinformatics; Gallium; Genetic algorithms; Genomics; Indexes; Silicon;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Electronics Engineers in Israel (IEEEI), 2010 IEEE 26th Convention of
  • Conference_Location
    Eliat
  • Print_ISBN
    978-1-4244-8681-6
  • Type

    conf

  • DOI
    10.1109/EEEI.2010.5661897
  • Filename
    5661897