DocumentCode :
2341511
Title :
Simultaneous classification and feature clustering using discriminant vector quantization with applications to microarray data analysis
Author :
Li, Jia ; Zha, Hongyuan
Author_Institution :
Stat. Dept., Pennsylvania State Univ., University Park, PA, USA
fYear :
2002
fDate :
2002
Firstpage :
246
Lastpage :
255
Abstract :
In many applications of supervised learning, automatic feature clustering is often desirable for a better understanding of the interaction among the various features as well as the interplay between the features and the class labels. In addition, for high dimensional data sets, feature clustering has the potential for improvement in classification accuracy and reduction in computational complexity. In this paper, a method is developed for simultaneous classification and feature clustering by extending discriminant vector quantization (DVQ), a prototype classification method derived from the principle of minimum description length using source coding techniques. The method incorporates feature clustering with classification performed by fusing features in the same clusters. To illustrate its effectiveness, the method has been applied to microarray gene expression data for human lymphoma classification. It is demonstrated that incorporating feature clustering improves classification accuracy, and the clusters generated match well with biological meaningful gene expression signature groups.
Keywords :
arrays; biology computing; data analysis; genetics; learning (artificial intelligence); pattern classification; pattern clustering; vector quantisation; automatic feature clustering; biological meaningful gene expression signature groups; class labels; classification accuracy; computational complexity; discriminant vector quantization; high dimensional data sets; human lymphoma classification; microarray data analysis; microarray gene expression data; minimum description length; simultaneous classification/feature clustering; source coding techniques; supervised learning; Application software; Bioinformatics; Computational complexity; Computer science; Data analysis; Data engineering; Gene expression; Prototypes; Supervised learning; Vector quantization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN :
0-7695-1653-X
Type :
conf
DOI :
10.1109/CSB.2002.1039347
Filename :
1039347
Link To Document :
بازگشت