Title : 
A simple strategy for detecting outlier samples in microarray data
         
        
            Author : 
Lu, Xuesong ; Li, Yanda ; Zhang, Xuegong
         
        
            Author_Institution : 
Dept. of Autom., Tsinghua Univ., Beijing, China
         
        
        
        
        
        
            Abstract : 
Microarrays can monitor expression levels of thousands of genes simultaneously. Many people have used the gene expression data obtained with microarrays to classify different groups of samples, such as different types or subtypes of cancers. In our experiments as well as those of some other investigators, it has been observed that in some microarray data sets, there might be outlier samples which are either caused by imperfectness in the experiments or by possible mislabeling at certain steps. The existence of such samples impacts classification accuracy and may even cause misleading conclusions. In this paper, we studied this problem with two simulated data sets of typical scenarios and formed a simple but powerful strategy for detecting such outlier or mislabeled samples, built upon cross validation of the basic SVM classifier. The strategy was applied to a public colon cancer data set and it successfully detected 6 outlier cases. This work suggests an effective scheme for detecting outlier samples in a data set and for evaluating the sample quality.
         
        
            Keywords : 
biology computing; genetics; molecular biophysics; pattern classification; support vector machines; gene expression data; microarray data sets; outlier samples detection; public colon cancer; support vector machines classifier; Automation; Bioinformatics; Cancer detection; Colon; Computerized monitoring; Diseases; Gene expression; Neoplasms; Support vector machine classification; Support vector machines;
         
        
        
        
            Conference_Titel : 
Control, Automation, Robotics and Vision Conference, 2004. ICARCV 2004 8th
         
        
            Print_ISBN : 
0-7803-8653-1
         
        
        
            DOI : 
10.1109/ICARCV.2004.1469039