DocumentCode :
1134677
Title :
A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis
Author :
Cheng, Qiang
Author_Institution :
Comput. Sci. Dept., Southern Illinois Univ. Carbondale, Carbondale, IL, USA
Volume :
7
Issue :
4
fYear :
2010
Firstpage :
636
Lastpage :
646
Abstract :
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein´s persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.
Keywords :
biology computing; feature extraction; genetics; learning (artificial intelligence); Greenshtein persistence; conjugate gradient-based primal-dual interior-point techniques; feature extraction; high-dimensional data; large-scale problems; machine learning; microarray gene analysis; multivariate features; pattern recognition; sparse learning machine; Bioinformatics; Cancer; Data mining; Feature extraction; Large-scale systems; Machine learning; Pattern analysis; Pattern recognition; Support vector machine classification; Support vector machines; High-dimensional data; bias; cancer classification; convex optimization; feature selection; microarray gene analysis.; persistence; primal-dual interior-point optimization; Artificial Intelligence; Gene Expression Profiling; Multivariate Analysis; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2009.8
Filename :
4770093
Link To Document :
بازگشت