Title :
Combining one-class support vector machines for microarray classification
Author :
Krawczyk, Bartosz
Author_Institution :
Dept. of Syst. & Comput. Networks, Wroclaw Univ. of Technol., Wrocław, Poland
Abstract :
The advance of high-throughput techniques, such as gene microarrays and protein chips have a major impact on contemporary biology and medicine. Due to the high-dimensionality and complexity of the data, it is impossible to analyze it manually. Therefore machine learning techniques play an important role in dealing with such data. In this paper we propose to use a one-class approach to classifying microarrays. Unlike canonical classifiers, these models rely only on objects coming from single class distributions. They distinguish observations coming from the given class from any other possible states of the object, that were unseen during the classification step. While having less information to dichotomize between classes, one-class models can easily learn the specific properties of a given dataset and are robust to difficulties embedded in the nature of the data. We show, that using one-class support vector machines can give as good results as canonical multi-class classifiers, while allowing to deal with imbalanced distribution and unexpected noise in the data. To cope with high dimensionality of the feature space, we propose to form an ensemble, based on Random Subspace and prune it with the usage of diversity measure. Experimental investigations, carried on public datasets, prove the usefulness of the proposed approach.
Keywords :
bioinformatics; data analysis; genetics; learning (artificial intelligence); pattern classification; support vector machines; biology; data complexity; data noise; dataset property learning; diversity measure; gene microarray; high-throughput technique; imbalanced distribution; machine learning technique; medicine; microarray classification; one-class support vector machines; protein chips; pruning; random subspace; single class distribution; Accuracy; Breast cancer; Kernel; Noise; Support vector machines; Training; bioinformatics; classifier ensembles; high dimensionality; machine learning; microarray analysis; multiple classifier systems; one-class classification;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on
Conference_Location :
Krako??w