Title :
Molecular Function Prediction Using Neighborhood Features
Author :
Bogdanov, Petko ; Singh, Ambuj K.
Author_Institution :
Dept. of Comput. Sci., Univ. of California, Santa Barbara, CA, USA
Abstract :
The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes.
Keywords :
bioinformatics; genetics; molecular biophysics; pattern classification; random processes; KNN classifier; Saccharomyces cerevisiae interaction networks; annotation patterns; functional neighborhood feature extraction; gene interaction data; gene interaction network; gene molecular function; genomewide networks; high throughput methods; leave one out validation experiments; molecular function prediction; neighborhood features; random walks with restarts; Gene function prediction; classification; feature extraction; functional interaction network.; Animals; Databases, Genetic; Gene Regulatory Networks; Genes; Genomics; Models, Statistical; Pattern Recognition, Automated; ROC Curve; Saccharomyces cerevisiae;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2009.81