مرکز منطقه ای اطلاع رساني علوم و فناوري - Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

Title of article :

Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

Author/Authors :

Okun، نويسنده , , Oleg and Priisalu، نويسنده , , Helen، نويسنده ,

Issue Information :

روزنامه با شماره پیاپی سال 2009

Pages :

From page :

151

To page :

162

Abstract :

SummaryObjective lore the link between dataset complexity, determining how difficult a dataset is for classification, and classification performance defined by low-variance and low-biased bolstered resubstitution error made by k-nearest neighbor classifiers. s and material xpression based cancer classification is used as the task in this study. Six gene expression datasets containing different types of cancer constitute test data. s h extensive simulation coupled with the copula method for analysis of association in bivariate data, we show that dataset complexity and bolstered resubstitution error are associated in terms of dependence. As a result, we propose a new scheme for generating ensembles of classifiers that selects subsets of features of low complexity for ensemble members, which constitutes the accurate members according to the found dependence relation. sion ments with six gene expression datasets demonstrate that our ensemble generating scheme based on the dependence of dataset complexity and classification error is superior to a single best classifier in the ensemble and to the traditional ensemble construction scheme that is ignorant of dataset complexity.

Keywords :

Gene expression , Pattern recognition , Ensemble of classifiers , K-Nearest Neighbors , Cancer classification

Journal title :

Artificial Intelligence In Medicine

Serial Year :

2009

Journal title :

Artificial Intelligence In Medicine

Record number :

1835105

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=1835105