Author_Institution :
Dept. of Software & Inf. Sci., Iwate Prefectureal Univ., Takizawamura, Japan
Abstract :
Using micro array technology, one can compare gene-expression levels of several thousand genes from two sample cells. Depending on the source of the samples, important investigations, like disease progress, accurate diagnosis, drug response and prognosis after treatment, etc., can be done. The aim of this work is to identify the smallest set of genes, whose expression could classify a target disease with highest accuracy. In this work, we present a two stage feature reduction. In Stage 1, the number of genes are reduced from thousands to N (N ? 100). In Stage 1, individually the relevance of a gene for classification is considered. We proposed a fast algorithm, which can efficiently handle data with more than 2 classes. In Stage 2, we search for the smallest subset of genes, from those selected in Stage 1, so that the classification result is highest. We use wrapper method for feature selection. The selection of the optimum subset of genes is a combinatorial optimization problem, with two optimization criterion - minimizing the cardinality of the subset and maximizing the classification result. Multi-objective optimization Genetic Algorithm is used, and we proposed a new Pareto front which is suitable for the gene-selection problem. The effectiveness of our two stage algorithm is verified using three benchmark gene-expression data, Lung Cancer dataset, SRBC dataset, and MLL dataset.
Keywords :
Pareto optimisation; cancer; combinatorial mathematics; data handling; diseases; feature selection; genetic algorithms; genetics; lab-on-a-chip; lung; medical computing; minimisation; pattern classification; MLL dataset; Pareto GA; Pareto front; SRBC dataset; cardinality minimization; combinatorial optimization problem; data handling; disease classification; feature reduction; feature selection; gene expression data; gene identification; gene selection; gene selection problem; genetic algorithm; lung cancer dataset; microarray data; multiobjective optimization; wrapper method; Biological cells; Gene expression; Genetic algorithms; Optimization; Sociology; Statistics; DNA microarray gene-expression data; Disease classification; Feature selection; Pareto Genetic Algorithm;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on