Title :
Prediction of protein allergenicity based on signal-processing bioinformatics approach
Author :
Chrysostomou, Charalambos ; Seker, Huseyin
Author_Institution :
Dept. of Genetics, Univ. of Leicester, Leicester, UK
Abstract :
Current bioinformatics tools accomplish high accuracies in classifying allergenic protein sequences with high homology and generally perform poorly with low homology protein sequences. Although some homologous regions explained Immunoglobulin E (IgE) cross-reactivity in groups of allergens, no universal molecular structure could be associated with allergenicity. In addition, studies have showed that cross-reactivity is not directly linked to the homology between protein sequences. Therefore, a new homology independent method needs to be developed to determine if a protein is an allergen or not. The aim of this study is therefore to differentiate sets of allergenic and non-allergenic proteins using a signal-processing based bioinformatics approach. In this paper, a new method was proposed for characterisation and classification of allergenic protein sequences. For this method hydrophobicity amino acid index was used to encode proteins to numerical sequences and Discrete Fourier Transform to extract features for each protein. Finally, a classifier was constructed based on Support Vector Machines. In order to demonstrate the applicability of the proposed method 857 allergen and 1000 non-allergen proteins were collected from UniProt online database. The results obtained from the proposed method yielded: MCC: 0.752 ± 0.007, Specificity: 0.912 ± 0.005, Sensitivity: 0.835 ± 0.008 and Total Accuracy: 87.65% ± 0.004.
Keywords :
bioinformatics; discrete Fourier transforms; feature extraction; hydrophobicity; medical signal processing; proteins; signal processing; support vector machines; UniProt online database; allergenic protein sequences; discrete Fourier transform; feature extraction; homologous regions; homology independent method; hydrophobicity amino acid index; immunoglobulin E cross-reactivity; low homology protein sequences; nonallergenic proteins; numerical sequences; protein allergenicity; signal-processing bioinformatics approach; support vector machines; universal molecular structure; Amino acids; Bioinformatics; Discrete Fourier transforms; Indexes; Proteins; Support vector machines;
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE
Conference_Location :
Chicago, IL
DOI :
10.1109/EMBC.2014.6943714