• DocumentCode
    1269105
  • Title

    A Weighted Principal Component Analysis and Its Application to Gene Expression Data

  • Author

    Pinto da Costa, Joaquim F ; Alonso, Hugo ; Roque, L.

  • Author_Institution
    Dept. de Matemdtica, Univ. do Porto, Porto, Portugal
  • Volume
    8
  • Issue
    1
  • fYear
    2011
  • Firstpage
    246
  • Lastpage
    252
  • Abstract
    In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson´s. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.
  • Keywords
    bioinformatics; genomics; principal component analysis; support vector machines; Significance Analysis; Support Vector Machine; gene expression; microarray data; weighted PCA; weighted principal component analysis; Algorithm design and analysis; Data analysis; Gene expression; Iterative algorithms; Metabolomics; Noise level; Noise robustness; Principal component analysis; Support vector machines; Correlation; gene selection.; microarray data; principal component analysis; support vector machines; Algorithms; Artificial Intelligence; Computational Biology; Data Mining; Databases, Genetic; Gene Expression Profiling; Humans; Oligonucleotide Array Sequence Analysis; Principal Component Analysis;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2009.61
  • Filename
    5184803