Title :
Predicting gene function with positive and unlabeled examples
Author :
Chen, Yiming ; Li, Zhoujun ; Hu, Xiaohua ; Diao, Hongxiang ; Liu, Junwan
Author_Institution :
Comput. Sch., Nat. Univ. of Defence & Technol., Changsha, China
Abstract :
Predicting gene function is usually formulated as binary classification problem. However, we only know which gene has some function while we are not sure that it doesn´t belong to a function class, which means that only positive examples are given. Therefore, selecting a good training example set becomes a key step. In this paper, we cluster the genes on integrated weighted graph by generalizing the cluster coefficient of unweighted graph to weighted one, and identify the reliable negative samples based on distance between a gene and centroid of positive clusters. Then, the tri-training algorithm is used to learn three classifiers from labeled and unlabeled examples to predict the gene function by combining three prediction result. The experiment results show that our approach outperforms several classic prediction methods.
Keywords :
genomics; graph theory; pattern classification; binary classification problem; cluster coefficient; gene clustering; gene function; tri-training algorithm; unweighted graph; weighted graph; Agricultural engineering; Bioinformatics; Clustering algorithms; Computer science; Educational institutions; Genomics; Information science; Large-scale systems; Prediction methods; Proteins;
Conference_Titel :
Granular Computing, 2009, GRC '09. IEEE International Conference on
Conference_Location :
Nanchang
Print_ISBN :
978-1-4244-4830-2
DOI :
10.1109/GRC.2009.5255161