Title :
Selection of statistical features based on mutual information for classification of human coding and non-coding DNA sequences
Author :
Liew, Alan Wee-chung ; Wu, Yonghui ; Yan, Hong
Author_Institution :
Dept. of Comput. Eng. & Inf. Technol., City Univ. of Hong Kong, China
Abstract :
The classification of human gene sequences into exons and introns is an important but difficult problem. We study the discriminative power of various statistical features (22 in total) in term of their mutual information (MI). By performing correlation analysis, we are able to identify a set of features that has high MI value while at the same time is complementary in their information content. Using the set of features, which consists of the three SZ features, the AMI feature, and the first stop codon feature, we are able to achieve classification accuracy as high as 92%.
Keywords :
DNA; correlation theory; feature extraction; pattern classification; statistical analysis; correlation analysis; exons; human coding; human gene sequence classification; introns; mutual information; noncoding DNA sequence; pattern classification; statistical feature selection; Bioinformatics; DNA computing; Genomics; Humans; Information technology; Machine learning algorithms; Mutual information; Power engineering and energy; Sequences; Statistics;
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
Print_ISBN :
0-7695-2128-2
DOI :
10.1109/ICPR.2004.1334641