DocumentCode
3549648
Title
A simple strategy for detecting outlier samples in microarray data
Author
Lu, Xuesong ; Li, Yanda ; Zhang, Xuegong
Author_Institution
Dept. of Autom., Tsinghua Univ., Beijing, China
Volume
2
fYear
2004
fDate
6-9 Dec. 2004
Firstpage
1331
Abstract
Microarrays can monitor expression levels of thousands of genes simultaneously. Many people have used the gene expression data obtained with microarrays to classify different groups of samples, such as different types or subtypes of cancers. In our experiments as well as those of some other investigators, it has been observed that in some microarray data sets, there might be outlier samples which are either caused by imperfectness in the experiments or by possible mislabeling at certain steps. The existence of such samples impacts classification accuracy and may even cause misleading conclusions. In this paper, we studied this problem with two simulated data sets of typical scenarios and formed a simple but powerful strategy for detecting such outlier or mislabeled samples, built upon cross validation of the basic SVM classifier. The strategy was applied to a public colon cancer data set and it successfully detected 6 outlier cases. This work suggests an effective scheme for detecting outlier samples in a data set and for evaluating the sample quality.
Keywords
biology computing; genetics; molecular biophysics; pattern classification; support vector machines; gene expression data; microarray data sets; outlier samples detection; public colon cancer; support vector machines classifier; Automation; Bioinformatics; Cancer detection; Colon; Computerized monitoring; Diseases; Gene expression; Neoplasms; Support vector machine classification; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Control, Automation, Robotics and Vision Conference, 2004. ICARCV 2004 8th
Print_ISBN
0-7803-8653-1
Type
conf
DOI
10.1109/ICARCV.2004.1469039
Filename
1469039
Link To Document