Title of article :
Identifying a small set of marker genes using minimum expected cost of misclassification
Author/Authors :
Huang، نويسنده , , Samuel H. and Mo، نويسنده , , Dengyao and Meller، نويسنده , , Jarek and Wagner، نويسنده , , Michael، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Abstract :
Objectives
aper presents a model independent feature selection approach to identify a small subset of marker genes.
s and material
luation measure, minimum expected cost of misclassification (MEMC), is used to estimate the discriminative power of a feature subset without building a model. The MECM measure is combined with sequential forward search for feature selection. This approach was applied to a breast cancer profiling problem, with the goal of identifying a small number of marker genes whose expression can be used to predict cancer molecular subtype (p53 gene status). Furthermore, the method was also applied to find a small set of single-nucleotide polymorphisms (SNPs) that can be used to predict molecular phenotype of a different type, namely alleles (genetic variants) of human leukocyte antigen genes that play an important roles in autoimmunity.
s
rker genes were identified based on p53 status, which achieved a p-value of 7.53 × 10−5 (vs. 6 × 10−4 with 32 genes identified by previous research) in survival analysis. Six SNP loci were identified that achieved a leave-one-out cross-validation accuracy of 92.8% (vs. 90.6% and 89.5% with 18 SNPs selected using χ2 statistics and information gain, respectively).
sion
CM-based feature selection approach is capable of identifying a smaller subset of market genes with comparable or even better performance than that obtained using conventional filter methods.
Keywords :
expected cost of misclassification , Sequential forward search , Breast cancer tumor classification , feature selection , Tag SNPs selection
Journal title :
Artificial Intelligence In Medicine
Journal title :
Artificial Intelligence In Medicine