Title :
Effectiveness of information extraction, multi-relational, and semi-supervised learning for predicting functional properties of genes
Author :
Krogel, Mark-A ; Scheffer, Tobias
Author_Institution :
FIN/IWS, Univ. of Magdeburg, Germany
Abstract :
We focus on the problem of predicting functional properties of the proteins corresponding to genes in the yeast genome. Our goal is to study the effectiveness of approaches that utilize all data sources that are available in this problem setting, including unlabeled and relational data, and abstracts of research papers. We study transduction and co-training for using unlabeled data. We investigate a propositionalization approach which uses relational gene interaction data. We study the benefit of information extraction for utilizing a collection of scientific abstracts. The studied tasks are KDD Cup tasks of 2001 and 2002. The solutions which we describe achieved the highest score for task 2 in 2001, the fourth rank for task 3 in 2001, the highest score for one of the two subtasks and the third place for the overall task 2 in 2002.
Keywords :
data mining; information retrieval; learning (artificial intelligence); relational databases; co-training; gene functional property prediction; information extraction; multirelational data; propositionalization approach; relational gene interaction data; semisupervised learning; unlabeled data; yeast genome; Abstracts; Bioinformatics; Computer science; Data mining; Fungi; Genetics; Genomics; Hidden Markov models; Proteins; Semisupervised learning;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1250979