DocumentCode :
3549648
Title :
A simple strategy for detecting outlier samples in microarray data
Author :
Lu, Xuesong ; Li, Yanda ; Zhang, Xuegong
Author_Institution :
Dept. of Autom., Tsinghua Univ., Beijing, China
Volume :
2
fYear :
2004
fDate :
6-9 Dec. 2004
Firstpage :
1331
Abstract :
Microarrays can monitor expression levels of thousands of genes simultaneously. Many people have used the gene expression data obtained with microarrays to classify different groups of samples, such as different types or subtypes of cancers. In our experiments as well as those of some other investigators, it has been observed that in some microarray data sets, there might be outlier samples which are either caused by imperfectness in the experiments or by possible mislabeling at certain steps. The existence of such samples impacts classification accuracy and may even cause misleading conclusions. In this paper, we studied this problem with two simulated data sets of typical scenarios and formed a simple but powerful strategy for detecting such outlier or mislabeled samples, built upon cross validation of the basic SVM classifier. The strategy was applied to a public colon cancer data set and it successfully detected 6 outlier cases. This work suggests an effective scheme for detecting outlier samples in a data set and for evaluating the sample quality.
Keywords :
biology computing; genetics; molecular biophysics; pattern classification; support vector machines; gene expression data; microarray data sets; outlier samples detection; public colon cancer; support vector machines classifier; Automation; Bioinformatics; Cancer detection; Colon; Computerized monitoring; Diseases; Gene expression; Neoplasms; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control, Automation, Robotics and Vision Conference, 2004. ICARCV 2004 8th
Print_ISBN :
0-7803-8653-1
Type :
conf
DOI :
10.1109/ICARCV.2004.1469039
Filename :
1469039
Link To Document :
بازگشت