Title :
An insight on complexity measures and classification in microarray data
Author :
Verónica Bolón-Canedo;Laura Morán-Fernández;Amparo Alonso-Betanzos
Author_Institution :
Department of Computer Science, University of A Coruñ
fDate :
7/1/2015 12:00:00 AM
Abstract :
Microarray data classification has been typically seen as a difficult challenge for machine learning researchers mainly due to its high dimension in feature while sample size is small. However, this type of data presents other complications such as overlapping between classes, dataset shift, class imbalance, non-linearity, or features extracted under extremely different distributions. This paper intends to analyze in depth the theoretical complexity of several popular binary datasets, by making use of complexity measures, and then connecting it with the empirical results obtained by four widely-used classifiers. Two different situations are covered: datasets with only training set and datasets originally divided into training and test sets. In both cases it is demonstrated that there exists a correlation between the complexity measures and the actual error rates, which can facilitate in the future how to deal with a given dataset. Finally, we present a case study on Prostate dataset, improving the test classification accuracy from 53% to 97%.
Keywords :
"Support vector machines","Colon","Proteins","DNA","Nickel"
Conference_Titel :
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN :
2161-4407
DOI :
10.1109/IJCNN.2015.7280302