Title :
A logistic regression based algorithm for identifying human disease genes
Author :
Bolin Chen ; Min Li ; Jianxin Wang ; Fang-xiang Wu
Author_Institution :
Div. of Biomed. Eng., Univ. of Saskatchewan, Saskatoon, SK, Canada
Abstract :
The identification of disease genes is the first step towards the understanding of genetic disease mechanisms. Although many computational algorithms are proposed to identify disease genes, they either have poor performance in terms of AUC scores or are very time consuming. To overcome these two problems, a logistic regression based algorithm is proposed in this study for identifying disease genes. The issue of disease gene identification is formulated as a two-class classification problem, where one class represents those disease genes, while the other class represents non-disease genes. A binary logistic regression is employed to predict the posterior probability of a gene associated with disease by taking prior labels as the categorical dependent variables and label related feature vectors as predictor variables. Numerical experiments show that the proposed logistic regression based algorithm not only have a very good performance, but also significantly reduce the computing time. The AUC score is 0.737 when no prior information is used and it increases to 0.766 when protein complex data are integrated. Averagely, the proposed algorithm only takes 1.31% and 37.35% running time of the existing MRF method and RWR algorithm, respectively, when generating one prediction in the leave-one-out cross validation method.
Keywords :
bioinformatics; data integration; diseases; feature extraction; genetic algorithms; genetics; genomics; logistics data processing; molecular biophysics; pattern classification; probability; proteins; regression analysis; AUC scores; MRF method; RWR algorithm; binary logistic regression; categorical dependent variables; computational algorithms; computing time; genetic disease mechanisms; human disease gene identification; label related feature vectors; leave-one-out cross validation method; logistic regression based algorithm; numerical experiments; posterior probability; predictor variables; protein complex data integration; two-class classification problem; Bioinformatics; Diseases; Logistics; Prediction algorithms; Protein engineering; Proteins; Vectors; human disease gene; logistic regression; protein complex; protein-protein interaction;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
DOI :
10.1109/BIBM.2014.6999153