DocumentCode :
952238
Title :
Gene Classification Using Codon Usage and Support Vector Machines
Author :
Ma, Jianmin ; Nguyen, Minh N. ; Rajapakse, Jagath C.
Author_Institution :
Biolnf. Res. Center, Nanyang Technol. Univ., Singapore
Volume :
6
Issue :
1
fYear :
2009
Firstpage :
134
Lastpage :
143
Abstract :
A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.
Keywords :
DNA; biology computing; genetics; molecular biophysics; pattern classification; support vector machines; DNA sequence; HLA-I molecules; HLA-II molecules; biological function; codon usage bias; codon usage frequencies; gene classification; human leukocyte antigen sequences; input feature vector; molecular structures; support vector machines; Cluster analysis; Human Leukocyte Antigen (HLA); Major Histocompatibility Complex (MHC); Relative Synonymous Codon Use (RSCU) frequency; codon usage bias; gene classification; Algorithms; Artificial Intelligence; Codon; Databases, Genetic; Discriminant Analysis; Genes; Genes, MHC Class I; Genes, MHC Class II; Genetic Code; HLA Antigens; Humans; Major Histocompatibility Complex; Normal Distribution; Pattern Recognition, Automated; Reproducibility of Results; Sequence Analysis, DNA;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70240
Filename :
4359889
Link To Document :
بازگشت