• DocumentCode
    2977251
  • Title

    Evaluation of missing values imputation methods in cDNA microarrays based on classification accuracy

  • Author

    Ghoneim, Vidan Fathi ; Solouma, Nahed H. ; Kadah, Yasser M.

  • Author_Institution
    Biomed. Eng. Dept., Misr Univ. for Sci. & Technol., 6th of October City, Egypt
  • fYear
    2011
  • fDate
    21-24 Feb. 2011
  • Firstpage
    367
  • Lastpage
    370
  • Abstract
    Many attempts have been carried out to deal with missing values (MV) in microarrays data representing gene expressions. This is a problematic issue as many data analysis techniques are not robust to missing data. Most of the MV imputation methods currently being used have been evaluated only in terms of the similarity between the original and imputed data. While imputed expression values themselves are not interesting, rather whether or not the imputed expression values are reliable to use in subsequent analysis is the major concern. This paper focuses on studying the impact of different MV imputation methods on the classification accuracy. The experimental work was first subjected to implementing three popular imputation methods, namely Singular Value Decomposition (SVD), weighted K-nearest neighbors (KNNimpute), and Zero replacement. The robustness of the three methods to the amount of missing data was then studied. The experiments were repeated for datasets with different missing rates (MR) over the range of 0-20% MR. In applying supervised two class classification we adopted a twofold approach, introducing all genes expressions to the classifiers as well as a subset of selected genes. The feature selection method used for gene selection is Fisher Discriminate Analysis (FDA), which improved noticeably the performance of the classifiers. The retained classifiers accuracies using imputed data after applying the three proposed imputation methods show slight variations over the specified range of MR. Thus, assessing that the three imputation methods in concern are robust.
  • Keywords
    DNA; biology computing; genetics; lab-on-a-chip; molecular biophysics; singular value decomposition; Fisher discriminate analysis; Zero replacement; feature selection method; gene expressions; gene selection; in cDNA microarrays; missing values imputation methods; singular value decomposition; weighted K-nearest neighbors; Accuracy; Bioinformatics; Euclidean distance; Gene expression; Robustness; Sensitivity; Support vector machines; classification; evaluation; imputation; microarrays;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering (MECBME), 2011 1st Middle East Conference on
  • Conference_Location
    Sharjah
  • Print_ISBN
    978-1-4244-6998-7
  • Type

    conf

  • DOI
    10.1109/MECBME.2011.5752142
  • Filename
    5752142