DocumentCode :
2777226
Title :
Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations
Author :
Masseroli, Marco ; Chicco, Davide ; Pinoli, Pietro
Author_Institution :
Dipt. di Elettron. e Inf., Politec. di Milano, Milan, Italy
fYear :
2012
fDate :
10-15 June 2012
Firstpage :
1
Lastpage :
8
Abstract :
Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.
Keywords :
bioinformatics; genetics; ontologies (artificial intelligence); probability; singular value decomposition; Bos taurus; Gallus gallus; annotation matrix; bioinformatics annotation prediction; biomolecular annotations; gene ontology annotation prediction; k-fold cross-validation; pLSAnorm; probabilistic latent semantic analysis algorithm; singular value decomposition; statistical technique; time-consuming curation procedure; Bioinformatics; Matrix decomposition; Ontologies; Prediction algorithms; Probabilistic logic; Semantics; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
ISSN :
2161-4393
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
Type :
conf
DOI :
10.1109/IJCNN.2012.6252767
Filename :
6252767
Link To Document :
بازگشت