DocumentCode :
3491485
Title :
A Population-Based Incremental Learning approach to microarray gene expression feature selection
Author :
Perez, Meir ; Rubin, David M. ; Marwala, Tshilidzi ; Scott, Lesley E. ; Stevens, Wendy
Author_Institution :
Dept. of Electr. & Electron. Eng. Technol., Univ. of Johannesburg, Johannesburg, South Africa
fYear :
2010
fDate :
17-20 Nov. 2010
Abstract :
The identification of a differentially expressed set of genes in microarray data analysis is essential, both for novel onco-genic pathway identification, as well as for automated diagnostic purposes. This paper assesses the effectiveness of the Population-Based Incremental Learning (PBIL) algorithm in identifying a class differentiating gene set for sample classification. PBIL is based on iteratively evolving the genome of a search population by updating a probability vector, guided by the extent of class-separability demonstrated by a combination of features. PBIL is compared, both to standard Genetic Algorithm (GA), as well as to an Analysis of Variance (ANOVA). The algorithms are tested on a publically available three-class leukaemia microarray data set (n=72). After running 30 repeats of both GA and PBIL, PBIL was able to find an average feature-space separability of 97.04%, while GA achieved an average class-separability of 96.39%. PBIL also found smaller feature-spaces than GA, (PBIL - 326 genes and GA - 2652) thus excluding a large percentage of redundant features. It also, on average, outperformed the ANOVA approach for n = 2652 (91.62%), q <; 0.05 (94.44%), q <; 0.01 (93.06%) and q <; 0.005 (95.83%). The best PBIL run (98.61%) even outperformed ANOVA for n = 326 and q <; 0.001 (both 97.22%). PBIL´s performance is ascribed to its ability to direct the search, not only towards the optimal solution, but also away from the worst.
Keywords :
biology computing; data analysis; genetic algorithms; genomics; learning (artificial intelligence); probability; statistical analysis; PBIL; analysis of variance; average feature-space separability; genetic algorithm; microarray data analysis; microarray gene expression feature selection; oncogenic pathway identification; population-based incremental learning algorithm; probability vector; sample classification; three-class leukaemia microarray data set; Analysis of variance; Bioinformatics; Gallium; Genetic algorithms; Genomics; Indexes; Silicon;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Electronics Engineers in Israel (IEEEI), 2010 IEEE 26th Convention of
Conference_Location :
Eliat
Print_ISBN :
978-1-4244-8681-6
Type :
conf
DOI :
10.1109/EEEI.2010.5661897
Filename :
5661897
Link To Document :
بازگشت