Title of article :
Unimodal transform of variables selected by interval segmentation purity for classification tree modeling of high-dimensional microarray data
Author/Authors :
Du، نويسنده , , Wen and Gu، نويسنده , , Ting and Tang، نويسنده , , Lijuan and Jiang، نويسنده , , Jian-Hui and Wu، نويسنده , , Hai-Long and Shen، نويسنده , , Guo-Li and Yu، نويسنده , , Ru-Qin، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2011
Abstract :
As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.
Keywords :
Classification and Regression Tree , variable selection , Gene expression , Mean shift , Interval segmentation purity