Title :
A supervised machine learning approach of extracting coexpression relationship among genes from literature
Author :
Tiwari, Richa ; Zhang, Chengcui ; Solorio, Thamar
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Alabama at Birmingham, Birmingham, AL, USA
Abstract :
It is vital to develop automatic information extraction systems to help researchers cope up with the vast amount of data available on the Internet. In this paper, we describe a framework to extract precise information about coexpression relationship among genes, from published literature using a supervised machine learning approach. We use a graphical model, Dynamic Conditional Random Fields (DCRFs), for training our classifier. Our approach is based on semantic analysis of text to classify the predicates describing coexpression relationship rather than detecting the presence of keywords. We compared our results of sentence classification with the baseline technique of word matching and a Naïve Bayes classification algorithm. Our framework outperformed the baseline by almost 45%, with DCRFs showing superior performance to Naïve Bayes.
Keywords :
Bayes methods; information retrieval; knowledge acquisition; learning (artificial intelligence); medical administrative data processing; pattern classification; random functions; text analysis; Naive Bayes classification; coexpression relationship extraction; dynamic conditional random field; gene information; published literature; supervised machine learning; text analysis; word matching; Classification algorithms; Data mining; Feature extraction; Hidden Markov models; Machine learning; Testing; Training; Dynamic Conditional Random Fields; Gene coexpression; Machine learning; Relationship extraction;
Conference_Titel :
Information Reuse and Integration (IRI), 2010 IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-8097-5
DOI :
10.1109/IRI.2010.5558956