• DocumentCode
    3756950
  • Title

    Random Forest with Random Projection to Impute Missing Gene Expression Data

  • Author

    Lovedeep Gondara

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois, Springfield, IL, USA
  • fYear
    2015
  • Firstpage
    1251
  • Lastpage
    1256
  • Abstract
    Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.
  • Keywords
    "Gene expression","Radio frequency","Support vector machines","Principal component analysis","Data models","Correlation","Vegetation"
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICMLA.2015.29
  • Filename
    7424493