DocumentCode :
636517
Title :
Signal-processing-based bioinformatics approach for the identification of influenza A virus subtypes in Neuraminidase genes
Author :
Chrysostomou, Charalambos ; Seker, Huseyin
Author_Institution :
Dept. of Genetics, Univ. of Leicester, Leicester, UK
fYear :
2013
fDate :
3-7 July 2013
Firstpage :
3066
Lastpage :
3069
Abstract :
Neuraminidase (NA) genes of influenza A virus is a highly potential candidate for antiviral drug development that can only be realized through true identification of its sub-types. In this paper, in order to accurately detect the sub-types, a hybrid predictive model is therefore developed and tested over proteins obtained from the four subtypes of the influenza A virus, namely, H1N1, H2N2, H3N2 and H5N1 that caused major pandemics in the twentieth century. The predictive model is built by the following four main steps; (i) decoding the protein sequences into numerical signals by means of EIIP amino acid scale, (ii) analysing these signals (protein sequences) by using Discrete Fourier Transform (DFT) and extracting DFT-based features, (iii) selecting more influential sub-set of the features by using the F-score statistical feature selection method, and finally (iv) building a predictive model on the feature sub-set by using support vector machine classifier. The protein sequences were chosen as to be of high percentage identity that they demonstrate within individual influenza subtype classes and high variation that they display in the percentage identity. This makes the proteins very difficult to distinguish from each other even they belong to different subtypes. Given this set of the proteins, the predictive model yielded 98.3% accuracy based on a 5-fold cross validation. This also results in a twenty feature sub-set that can also help reveal spectral characteristics of the subtypes. The proposed model is promising and can easily be generalized for other similar studies.
Keywords :
bioinformatics; discrete Fourier transforms; diseases; feature extraction; microorganisms; molecular biophysics; molecular configurations; proteomics; support vector machines; DFT based feature extraction; EIIP amino acid scale; F-score statistical feature selection method; H1N1; H2N2; H3N2; H5N1; Neuraminidase genes; antiviral drug development; discrete Fourier transform; hybrid predictive model; influenza A virus subtype identification; numerical signals; percentage identity; protein sequence decoding; signal processing based bioinformatics; support vector machine classifier; Accuracy; Amino acids; Discrete Fourier transforms; Feature extraction; Predictive models; Proteins; Support vector machines; Amino Acid Indices; Discrete Fourier Transform (DFT); F-score; Neuraminidase Genes; Support Vector Machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE
Conference_Location :
Osaka
ISSN :
1557-170X
Type :
conf
DOI :
10.1109/EMBC.2013.6610188
Filename :
6610188
Link To Document :
بازگشت