• DocumentCode
    3549648
  • Title

    A simple strategy for detecting outlier samples in microarray data

  • Author

    Lu, Xuesong ; Li, Yanda ; Zhang, Xuegong

  • Author_Institution
    Dept. of Autom., Tsinghua Univ., Beijing, China
  • Volume
    2
  • fYear
    2004
  • fDate
    6-9 Dec. 2004
  • Firstpage
    1331
  • Abstract
    Microarrays can monitor expression levels of thousands of genes simultaneously. Many people have used the gene expression data obtained with microarrays to classify different groups of samples, such as different types or subtypes of cancers. In our experiments as well as those of some other investigators, it has been observed that in some microarray data sets, there might be outlier samples which are either caused by imperfectness in the experiments or by possible mislabeling at certain steps. The existence of such samples impacts classification accuracy and may even cause misleading conclusions. In this paper, we studied this problem with two simulated data sets of typical scenarios and formed a simple but powerful strategy for detecting such outlier or mislabeled samples, built upon cross validation of the basic SVM classifier. The strategy was applied to a public colon cancer data set and it successfully detected 6 outlier cases. This work suggests an effective scheme for detecting outlier samples in a data set and for evaluating the sample quality.
  • Keywords
    biology computing; genetics; molecular biophysics; pattern classification; support vector machines; gene expression data; microarray data sets; outlier samples detection; public colon cancer; support vector machines classifier; Automation; Bioinformatics; Cancer detection; Colon; Computerized monitoring; Diseases; Gene expression; Neoplasms; Support vector machine classification; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control, Automation, Robotics and Vision Conference, 2004. ICARCV 2004 8th
  • Print_ISBN
    0-7803-8653-1
  • Type

    conf

  • DOI
    10.1109/ICARCV.2004.1469039
  • Filename
    1469039