DocumentCode
592528
Title
A new feature selection algorithm for two-class classification problems and application to endometrial cancer
Author
Ahsen, M. Eren ; Singh, Neeraj Kumar ; Boren, T. ; Vidyasagar, M. ; White, M.A.
Author_Institution
Dept. of Bioeng., Univ. of Texas at Dallas, Richardson, TX, USA
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
2976
Lastpage
2982
Abstract
In this paper, we introduce a new algorithm for feature selection for two-class classification problems, called ℓ1-StaR. The algorithm consists of first extracting the statistically relevant features using the Student t-test, and then passing the reduced feature set to an ℓ1-norm support vector machine (SVM) with recursive feature elimination (RFE). The final number of features chosen by the ℓ1-StaR algorithm can be smaller than the number of samples, unlike with ℓ1-norm regression where the final number of features is bounded below by the number of samples. The algorithm is illustrated by applying it to the problem of determining which endometrial cancer patients are at risk of having the cancer spreading to their lymph nodes. The data consisted of 1,428 micro-RNAs measured on a data set of 94 patient samples (divided evenly between those with lymph node metastasis and those without). Using the algorithm, we identified a subset of just 15 micro-RNAs and a linear classifier based on these, that achieved two-fold cross validation accuracies in excess of 80%, and combined accuracy, sensitivity and specificity in excess of 93%.
Keywords
RNA; biology computing; cancer; feature extraction; patient diagnosis; pattern classification; statistical testing; support vector machines; RFE; SVM; Student t-test; endometrial cancer; endometrial cancer patients; feature selection algorithm; l1-StaR; l1-norm regression; l1-norm support vector machine; linear classifier; lymph node metastasis; lymph nodes; microRNA; recursive feature elimination; reduced feature set; statistically relevant feature extraction; two-class classification problems; Lymph nodes; Metastasis; Sensitivity; Support vector machines; Tumors; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control (CDC), 2012 IEEE 51st Annual Conference on
Conference_Location
Maui, HI
ISSN
0743-1546
Print_ISBN
978-1-4673-2065-8
Electronic_ISBN
0743-1546
Type
conf
DOI
10.1109/CDC.2012.6426819
Filename
6426819
Link To Document