DocumentCode :
1085650
Title :
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification
Author :
Liu, Zhenqiu ; Lin, Shili ; Tan, Ming T.
Author_Institution :
Dept. of Epidemiology & Preventive Med., Univ. of Maryland, Baltimore, MD, USA
Volume :
7
Issue :
1
fYear :
2010
Firstpage :
100
Lastpage :
107
Abstract :
The development of high-throughput technology has generated a massive amount of high-dimensional data, and many of them are of discrete type. Robust and efficient learning algorithms such as LASSO [1] are required for feature selection and overfitting control. However, most feature selection algorithms are only applicable to the continuous data type. In this paper, we propose a novel method for sparse support vector machines (SVMs) with Lp (p < 1) regularization. Efficient algorithms (LpSVM) are developed for learning the classifier that is applicable to high-dimensional data sets with both discrete and continuous data types. The regularization parameters are estimated through maximizing the area under the ROC curve (AUC) of the cross-validation data. Experimental results on protein sequence and SNP data attest to the accuracy, sparsity, and efficiency of the proposed algorithm. Biomarkers identified with our methods are compared with those from other methods in the literature. The software package in Matlab is available upon request.
Keywords :
biology computing; learning (artificial intelligence); mathematics computing; molecular biophysics; molecular configurations; parameter estimation; proteins; sensitivity analysis; support vector machines; Lp penalty; Matlab; ROC curve; biomarker identification; high-dimensional data sets; learning algorithms; protein sequence; regularization parameter estimation; single nucleotide polymorphism; sparse support vector machines; Embedded method; I.5.2.b Feature evaluation and selection; I.5.2.c Pattern analysis; I.5.4.h Medicine; J.3.a Biology and genetics; L_{p} regularization; SNP data analysis; SVM; feature selection; protease data analysis.; Algorithms; Artificial Intelligence; Biological Markers; Gene Expression Profiling; Multigene Family; Pattern Recognition, Automated;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2008.17
Filename :
4459304
Link To Document :
بازگشت