Title :
A hybrid feature selection algorithm used in disease association study
Author :
Wei, Bin ; Peng, Qinke ; Kang, Xuejiao ; Li, Chenyao
Author_Institution :
Syst. Eng. Inst. of Electron. & Inf. Eng. Sch., Xi´´an Jiaotong Univ., Xi´´an, China
Abstract :
With the rapid development of high-throughput genotyping technologies, more and more attentions are paid to the disease association study identifying DNA variations that are highly associated with a specific disease. One main challenge for this study is to find the optimal subsets of Single Nucleotide Polymorphisms (SNPs) which are most tightly associated with diseases. Feature selection which might effectively reduce the computational complexity has become a necessity in many bioinformatics applications. Hence we present a prediction algorithm based on support vector machine (SVM) with a hybrid feature selection method named F-score and compact GA (FCGA). FCGA combines the advantage of filter method and wrapper method, which not only eliminates the redundancy of feature and reduces computing time, but also solves the problem of SVM´s parameters selection. We use this prediction algorithm to analyze the lung cancer dataset including 595 samples and each one has 141 SNPs. To evaluate the prediction accuracy of our algorithm, we compare it with Naive Bayes along with some commonly used feature selection methods. The experimental results show that the proposed algorithm has the highest level of accuracies compared with the other methods.
Keywords :
Bayes methods; bioinformatics; computational complexity; diseases; feature extraction; support vector machines; DNA variations; F-score method; Naive Bayes method; SVM parameters selection; bioinformatics applications; computational complexity reduction; disease association study; feature redundancy; high throughput genotyping technologies; hybrid feature selection algorithm; prediction algorithm; single nucleotide polymorphism; support vector machine; Accuracy; Cancer; Classification algorithms; Diseases; Filtering algorithms; Prediction algorithms; Support vector machines; Disease association study; Feature selection; SNP; SVM;
Conference_Titel :
Intelligent Control and Automation (WCICA), 2010 8th World Congress on
Conference_Location :
Jinan
Print_ISBN :
978-1-4244-6712-9
DOI :
10.1109/WCICA.2010.5554442