DocumentCode :
3756950
Title :
Random Forest with Random Projection to Impute Missing Gene Expression Data
Author :
Lovedeep Gondara
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois, Springfield, IL, USA
fYear :
2015
Firstpage :
1251
Lastpage :
1256
Abstract :
Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.
Keywords :
"Gene expression","Radio frequency","Support vector machines","Principal component analysis","Data models","Correlation","Vegetation"
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type :
conf
DOI :
10.1109/ICMLA.2015.29
Filename :
7424493
Link To Document :
بازگشت