Title :
Filter vs. Wrapper approach for optimum gene selection of high dimensional gene expression dataset: An analysis with cancer datasets
Author :
Srivastava, Bhavna ; Srivastava, Rajeev ; Jangid, Mahesh
Author_Institution :
Dept. of Comput. Sci. & Eng., Manipal Univ., Jaipur, India
Abstract :
In Bioinformatics, gene dataset experiments are generating thousands of gene expression measurements, which generally used to collect information from tissue and cell samples regarding gene expression differences. Optimum gene selection from such gene expression datasets and their classification plays an important role for disease prediction & diagnosis. Further the task ahead to understand that, what is the best way of gene selection to get maximum classification accuracy from such high dimensional gene expression dataset, whether the filter is the best way to rely upon or wrapper approach can be the best suitable, beyond that which classifier works well with filter and with wrapper? To answer the question, in this paper, the performance of the filter vs. wrapper gene selection technique is being evaluated by supervised classifiers over three well known public domain datasets viz. Ovarian Cancer, Lymphomas & Leukemia. For optimal gene selection, ReliefF method is used as a filter based gene selection and Random gene subset selection algorithm is used as a wrapper based gene selection. For classification, different linear as well as an ensemble classifiers have been tested for their performances. This paper also tries to bring the fact of timing details so that through analysis, it can get derived upon that which approach is more appropriate for better time management as well as with high accuracy of the selected dataset.
Keywords :
bioinformatics; diseases; genetics; medical computing; pattern classification; ReliefF method; bioinformatics; cancer datasets; cell samples; disease diagnosis; disease prediction; filter approach; filter based gene selection; gene dataset experiments; gene expression measurements; high dimensional gene expression dataset; leukemia; lymphomas; maximum classification accuracy; optimum gene selection; ovarian cancer; public domain datasets; random gene subset selection algorithm; supervised classifiers; time management; tissue samples; wrapper approach; Accuracy; Algorithm design and analysis; Bioinformatics; Classification algorithms; Niobium; Prediction algorithms; Support vector machines; Ensemble classifier; Gene selection; Linear classifier; Random gene subset selection; ReliefF;
Conference_Titel :
High Performance Computing and Applications (ICHPCA), 2014 International Conference on
Print_ISBN :
978-1-4799-5957-0
DOI :
10.1109/ICHPCA.2014.7045359