DocumentCode
3756950
Title
Random Forest with Random Projection to Impute Missing Gene Expression Data
Author
Lovedeep Gondara
Author_Institution
Dept. of Comput. Sci., Univ. of Illinois, Springfield, IL, USA
fYear
2015
Firstpage
1251
Lastpage
1256
Abstract
Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.
Keywords
"Gene expression","Radio frequency","Support vector machines","Principal component analysis","Data models","Correlation","Vegetation"
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type
conf
DOI
10.1109/ICMLA.2015.29
Filename
7424493
Link To Document